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ABSTRACT 


Pattern  Search  Algorithms  for  Mixed  Variable 
General  Constrained  Optimization  Problems 


by 

Mark  Aaron  Abramson 


A  new  class  of  algorithms  for  solving  nonlinearly  constrained  mixed  variable  opti¬ 
mization  problems  is  presented.  The  Audet-Dennis  Generalized  Pattern  Search  (GPS) 
algorithm  for  bound  constrained  mixed  variable  optimization  problems  is  extended  to 
problems  with  general  nonlinear  constraints  by  incorporating  a  filter,  in  which  new  it¬ 
erates  are  accepted  whenever  they  decrease  the  incumbent  objective  function  value  or 
constraint  violation  function  value.  Additionally,  the  algorithm  can  exploit  any  avail¬ 
able  derivative  information  (or  rough  approximation  thereof)  to  speed  convergence 
without  sacrificing  the  flexibility  often  employed  by  GPS  methods  to  find  better  local 
optima.  In  generalizing  existing  GPS  algorithms,  the  new  theoretical  convergence 
results  presented  here  reduce  seamlessly  to  existing  results  for  more  specific  classes  of 
problems.  While  no  local  continuity  or  smoothness  assumptions  axe  made,  a  hierar¬ 
chy  of  theoretical  convergence  results  is  given,  in  which  the  assumptions  dictate  what 
can  be  proved  about  certain  limit  points  of  the  algorithm.  A  new  Matlab®  software 
package  was  developed  to  implement  these  algorithms.  Numerical  results  are  pro¬ 
vided  for  several  nonlinear  optimization  problems  from  the  CUTE  test  set,  as  well  as 
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Chapter  1 
Introduction 

1.1  Mixed  Variable  Optimization  Problems 

Optimization  problems  arise  in  engineering  when  the  goal  is  to  find  a  design,  ex¬ 
pressed  as  a  vector  of  variables,  that  minimizes  or  maximizes  some  function  that  acts 
as  a  measure  of  the  merit  of  the  design.  Typically,  there  are  also  constraints  on  the 
variables,  expressed  as  inequalities,  to  limit  the  choices  of  values  the  variables  are 
permitted  to  take  on.  One  of  the  broadest  classes  of  optimization  problems  is  known 
as  mixed-integer  nonlinear  programming  (MINLP)  problems,  which  are  characterized 
by  nonlinear  objective  and  constraint  functions  with  a  mixture  of  integer  and  contin¬ 
uous  variables.  An  MINLP  problem  without  discrete  variables  is  simply  a  nonlinear 
programming  (NLP)  problem.  Similar  to  MINLP  problems  is  a  more  general  class 
of  problems  only  recently  studied  rigorously,  known  as  mixed  variable  programming 
(MVP)  problems,  in  which  some  of  the  discrete  variables  are  categorical. 

Categorical  variables  are  those  that  take  on  values  from  a  pre-defined  list  of  cat¬ 
egories.  Some  common  examples  are  color,  shape,  or  type  of  material.  They  can 
be  (and  often  are)  assigned  discrete  values,  but  the  values  usually  have  no  inherent 
ordering,  and  thus  are  meaningless.  For  example,  in  a  structural  design  problem  with 
material  type  as  a  variable,  discrete  values  might  be  assigned,  such  as  1  =  steel,  2  = 
aluminum,  etc.  When  considering  potential  iterative  solution  approaches,  categorical 
variables  can  be  thought  of  as  variables  whose  “discreteness”  must  be  satisfied  at 
every  iteration,  and  not  just  at  the  solution.  In  many  applications,  these  variables 
are  parameters  to  a  large  and  expensive  simulation,  based  on  differential  equation 
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models,  in  which  the  simulation  software  simply  rejects  values  that  do  not  come  from 
a  pre-defined  list. 

The  traditional  approach  to  engineering  design  problems  with  categorical  vari¬ 
ables  is  to  conduct  parametric  studies,  in  which  designs  are  optimized  for  specific 
fixed  values  of  the  categorical  variables  and  then  compared  against  each  other.  Test 
values  are  typically  chosen  based  on  an  engineer’s  judgement  and  familiarity  with  the 
problem.  Problems  with  only  one  categorical  variable  that  can  take  on  only  a  few 
values  are  usually  not  difficult  to  solve,  since  the  possible  choices  for  the  categorical 
variable  value  can  be  exhaustively  enumerated  at  relatively  low  cost.  However,  prob¬ 
lems  with  many  categorical  variables  that  can  take  on  several  different  values  can 
quickly  become  too  costly  to  solve  with  this  approach. 

In  addition  to  the  general  nature  of  the  variables,  many  optimization  problems  do 
not  have  well-behaved  objective  and  constraint  functions.  Specifically,  functions  may 
have  discontinuities,  undefined  regions,  or  take  on  infinite  value  at  specific  points.  In 
fact,  in  some  engineering  simulations,  function  calls  may  even  result  in  the  function 
simply  not  returning  a  value  ( e.g .,  see  Booker  et  al.  [20]).  This  happens,  for  ex¬ 
ample,  when  each  function  evaluation  requires  the  solution  of  a  system  of  differential 
equations,  and  the  method  used  to  numerically  solve  the  system  fails  to  generate  a  so¬ 
lution.  Classical  gradient-based  or  Newton-based  methods  cannot  be  used  unmodified 
on  problems  of  this  type;  indeed,  few  existing  methods  can.  Although  the  objective 
and  constraint  functions  will  not  be  assumed  to  have  any  continuity  or  smoothness 
properties,  the  strength  of  the  results  that  can  be  shown  both  theoretically  and  com¬ 
putationally  will  depend  on  such  properties. 

Another  challenging  aspect  of  MVP  problems  is  that  changes  in  the  discrete  vari¬ 
ables  can  mean  a  change  in  the  constraints,  and  even  in  the  dimensions  of  the  problem. 
For  example,  if  an  engineer  wants  to  build  a  structure,  different  materials  will  have 
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different  characteristics,  such  as  bounds  on  thicknesses.  Such  is  the  case  in  [89],  where 
the  number  of  continuous  variables  is  a  function  of  the  number  of  heat  intercepts  in 
a  thermal  insulation  system,  the  latter  being  treated  as  a  categorical  variable.  The 
constraint  set  can  change,  since  each  continuous  variable  has  simple  bounds.  In  fact, 
in  this  problem,  even  the  dimension  of  the  categorical  variable  space  can  change, 
since  the  types  of  insulators  between  intercepts  are  categorical  design  variables,  and 
a  change  in  the  number  of  intercepts  changes  the  number  of  insulators.  In  Chapter  7, 
the  problem  is  expanded  from  that  of  [89]  to  include  nonlinear  constraints  on  system 
weight,  stress,  and  thermal  contraction.  If  constraints  were  placed,  for  example,  on 
the  thermal  contraction  of  each  insulator,  then  the  number  of  these  constraints  would 
also  change,  depending  on  the  number  of  intercepts  used. 

Allowing  for  this  flexibility  is  a  nice  feature,  but  it  complicates  the  mathematical 
formulation  of  the  problem.  In  order  to  fully  describe  it,  we  will  partition  a  point  x 
into  its  continuous  and  discrete  parts  xc  and  xd,  respectively;  i.e.,  x  =  (xc,  xd),  where 
xd  £  Zn  and  xc  6  SR”  ,  and  where  nc  and  nd  denote  the  maximum  dimensions  of 
the  continuous  and  discrete  variables,  respectively.  By  convention,  we  simply  ignore 
unused  variables. 

Similarly,  since  the  number  of  nonlinear  constraints  can  vary  as  well,  we  let  p  <  oo 
be  the  maximum  number  of  constraints,  and  by  convention,  any  unused  constraints 
are  set  to  the  zero  function,  so  that  they  are  always  trivially  satisfied. 

The  MVP  problem  of  interest  in  this  research  is  expressed  as  follows: 


min 

x€X 


/(*) 


s.  t.  C(x )  <  0, 


(1.1) 


where  /  :  X  — >  9?  U  {oo},  and  C  :  X  — »  ($ft  U  {oo})p  with  C  =  (C'i ,  C2, . . . ,  CP)T . 
The  domain  X  is  partitioned  into  continuous  and  discrete  variable  spaces  Xc  and  Xd, 
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respectively,  where  Xc  C  5ftnC  and  Xd  C  -2nd.  That  is,  V  =  Xc  x  but  constraints 
can  be  functions  of  both  xc  and  xd. 

Furthermore,  Xc  is  defined  by  bound  and  simple  linear  constraints;  i.e.,  for  each 
fixed  set  of  discrete  variable  values  xd,  we  have 

Xc(xd )  =  {xc  €  5ftn‘  :  l(xd)  <  A(xd)xc  <  w(rrrf)},  (1.2) 

where  A(xd)  E  5ftmCxnC,  £(xd),u(xd)  £  (5ft  U  {±oo})n<\  and  £(xd)  <  u(xd).  We  denote 
the  entire  feasible  region  by  0  =  {i  G  I  :  C(x)  <  0},  with  its  corresponding 
partitioning  0  =  (f2c,fid). 

Again,  the  key  point  here  is  that  changes  in  the  discrete  variables  values  can  result 
in  changes  to  the  constraints;  hence,  the  explicit  dependence  of  Xc  on  xd  in  (1.2). 
Note  that  if  nd  =  0,  the  problem  reduces  to  a  standard  NLP  problem,  in  which  £,  A, 
and  u  do  not  change.  For  convenience,  we  now  omit  the  explicit  dependence  on  xd  in 
this  document,  even  though  it  is  understood  for  MVP  problems. 

1.2  Optimality  Conditions 

In  order  to  prove  certain  theoretical  results  later,  some  basic  optimization  terminology 
is  now  provided.  The  concepts  of  tangent  and  normal  cones  are  first  defined,  after 
which  the  Karush-Kuhn-Tucker  (KKT)  and  Fritz  John  first-order  necessary  conditions 
for  optimality  are  given.  For  the  purpose  of  clarity  -  and  in  this  section  only,  -  we 
assume  that  the  constraints  C  include  the  linear  constraints  that  would  normally 
define  X ,  and  thus,  Xc  =  5ftn<\  Note  that  this  is  allowed  within  the  construction  of 
(1.1). 

It  is  also  necessary  to  point  out  that  many  definitions  and  theorems  in  this  doc¬ 
ument,  including  the  description  of  the  KKT  and  Fritz  John  conditions  below,  have 
been  modified  slightly  so  that  they  can  be  applied  seamlessly  to  MVP  problems. 
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Specifically  we  add  the  phrase,  with  respect  to  the  continuous  variables ,  to  mean 
that  a  stated  condition  applies  to  the  continuous  variables  while  holding  the  discrete 
variables  constant.  For  example,  the  gradient  of  /  with  respect  to  the  continuous 
variables  is  denoted  Vc/,  and  a  function  /  is  differentiable  at  x  with  respect  to  the 
continuous  variables  if  Vc/(x)  exists. 

The  first  definition  can  be  found  in  [114]. 

Definition  1.1  A  vector  w  e  5Rn  is  tangent  to  the  set  Y  at  x  e  Y  if 
for  all  sequences  {xj  C  Y  with  x»  ->  x,  and  all  positive  scalar  sequences 
U  l  0,  there  exists  a  sequence  — >  w  such  that  x,  +  tiwl  e  Y  for  all  i. 

The  tangent  cone  TY(x )  is  the  collection  of  all  tangent  vectors  to  Y  at  x. 

If  Y  is  a  convex  set  (i.e.,  if  v,  w  €  Y,  then  av  +  (1  -  a)w  e  Y  for  all  a  G  [0, 1]),  then 
a  straightforward  application  of  Definition  1.1  yields 

Ty{x)  =  cl{t(iu  -  x)  :  t  >  0,  w  G  Y}.  (1.3) 

This  latter  definition  will  be  used  often,  since  Xc  is  a  convex  set. 

The  following  definition  (see  [133])  will  be  useful  in  defining  the  normal  cone  and 
other  cones  described  in  this  document. 

Definition  1.2  The  polar  of  a  cone  V,  denoted  by  V°,  is  given  by 

V°  =  {v  £  :  vTw  <  0  V  w  G  V}.  (1.4) 

The  definition  of  the  normal  cone  is  always  as  the  polar  (or  orthogonal  comple¬ 
ment)  of  the  tangent  cone  [114]: 

Definition  1.3  The  normal  cone  to  Y  at  x,  denoted  by  NY{x),  is  given 

by 


NY{x)  =  Ty(x)  =  {ve  :  vTw  <  0  V  w  e  TY(x)}. 


(1.5) 
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The  following  theorem  describes  the  Karush-Kuhn-Tucker  (KKT)  first-order  nec¬ 
essary  conditions  for  optimality.  Originally  published  in  [83]  and  [92],  its  proof  is 
omitted,  since  it  can  be  found  in  many  standard  nonlinear  programming  textbooks, 
such  as  [14].  The  constraint  qualification  mentioned  in  the  statement  of  the  theorem 
is  an  additional  assumption  that  must  be  satisfied  in  order  for  the  theorem  to  hold. 
Several  different  constraint  qualifications  appear  in  the  literature,  and  are  summa¬ 
rized  in  [14]  or  [104].  Among  these  is  the  KKT  constraint  qualification,  which  is  stated 
immediately  preceding  the  theorem.  It  can  be  shown  that  a  sufficient  condition  for 
the  first-order  KKT  constraint  qualification  to  hold  at  x  is  that  the  Jacobian  of  the 
binding  constraints  (ie.,  i,  where  C'i(x)  =  0  holds)  has  full  rank  [14], 

Definition  1.4  The  KKT  first-order  constraint  qualification  holds  at 
x  G  5R"C  x  Znd  with  respect  to  the  continuous  variables  if,  for  every  v  G 
Tqc(x),  there  exists  a  continuously  differentiable  feasible  arc  a  :  [0,  r]  — > 
for  some  real  scalar  r  >  0  such  that  a(0)  =  xc  and  a'(0)  =  v. 

Theorem  1.5  Let  x  be  a  feasible  solution  of  the  optimization  problem 
(1.1),  and  let  I  =  {i  :  Cfix)  =  0}.  Suppose  that  /  and  Ci,  i  G  I,  are 
differentiable  at  x  with  respect  to  the  continuous  variables,  and  that  a 
constraint  qualification  holds  with  respect  to  the  continuous  variables.  If 
a;  is  a  local  optimal  solution,  then  \/cf{x)Tv  >  0  for  all  v  G  Tq,c(x),  and 
—Vcf(x)  G  Nqc(x).  The  point  x  is  said  to  satisfy  the  first-order  Karush- 
Kuhn-Tucker  (KKT)  necessary  conditions  for  optimality  with  respect  to 
the  continuous  variables. 

An  alternative  to  the  KKT  conditions  for  optimality  is  the  Fritz  John  conditions 
(see  [82]).  These  conditions  also  have  a  similar  constraint  qualification,  but  it  is 
automatically  satisfied  for  any  problem  without  equality  constraints,  as  is  the  case 
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for  (1.1).  The  Fritz  John  conditions  are  described  by  the  following  theorem,  whose 
proof  is  omitted  but  can  be  found  in  many  textbooks,  including  [14]  and  [104]. 

Theorem  1.6  Let  x  be  a  feasible  solution  of  the  optimization  problem 
(1.1).  If  /  and  C  are  differentiable  at  x  with  respect  to  the  continu¬ 
ous  variables,  then  there  exist  scalars  ho,  h\, . . . ,  /zm,  not  all  zero,  such 
that  Hi Ci{x)  =  0  and  Hi  >  0  for  all  i  =  1, 2, . . . ,  m,  and  Ho^cf(x)  = 
y^jL1VcCl(x).  The  point  x  is  said  to  satisfy  the  first-order  Fritz  John 

i=  1 

necessary  conditions  for  optimality  with  respect  to  the  continuous  vari¬ 
ables. 

We  omit  the  discussion  of  other  optimality  conditions  which  provide  for  stronger 
results,  since  the  algorithms  discussed  in  this  document  are  generally  unable  to  satisfy 
the  hypotheses  needed  to  obtain  the  stronger  results. 

1.3  Purpose  of  the  Research 

To  date,  methods  to  solve  MVP  problems  are  limited.  In  Chapter  2,  we  will  see 
that  traditional  MINLP  methods  cannot  solve  these  problems,  and  search  heuristics 
generally  lack  the  rigor  needed  to  prove  desired  convergence  properties.  Perhaps  the 
most  promising  approach  is  the  class  of  algorithms  known  as  Generalized  Pattern 
Search  (GPS)  methods,  which  will  be  discussed  in  detail  in  subsequent  chapters.  In 
fact,  Audet  and  Dennis  recently  introduced  a  GPS  method  for  solving  MVP  problems 
with  simple  bound  constraints  [8]. 

The  purpose  of  this  research  is  to  develop  a  suitable  GPS  algorithm  to  numerically 
solve  MVP  problems  with  general  nonlinear  constraints.  Specifically,  the  goal  was  to 
devise  the  algorithm,  establish  conditions  by  which  a  limit  point  satisfies  certain 
optimality  conditions,  prove  convergence  to  a  limit  point,  implement  the  algorithm 
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Chapter  2 


Literature  Survey 


In  considering  the  solution  of  MVP  problems,  a  discussion  of  the  existing  literature  is 
warranted.  Specifically,  three  relevant  classes  of  solution  approaches  will  be  addressed 
in  this  chapter;  namely,  relaxation  methods  for  solving  MINLP  problems,  search 
heuristics,  and  pattern  search  methods.  For  each  class  of  methods,  strengths  and 
weaknesses  will  be  explored,  relative  to  the  type  of  problems  we  want  to  solve.  At  the 
end  of  this  discussion,  we  offer  a  summary  of  why  generalized  pattern  search  methods 
are  a  logical  approach  for  solving  MVP  problems  with  nonlinear  constraints. 

2.1  Methods  for  Solving  MINLP  Problems 

Methods  that  have  been  developed  for  solving  MINLP  problems  all  involve  iteratively 
solving  subproblems  which  are  relaxed  in  some  way.  Since  these  relaxations  involve, 
at  some  point,  relaxing  the  integrality  of  the  integer  variables,  they  cannot  be  used  to 
solve  MVP  problems.  Despite  this  drawback,  a  brief  overview  of  each  class  of  these 
methods  is  included  for  the  sake  of  completeness.  Classes  of  MINLP  methods  include 
Outer  Approximation,  Generalized  Benders  Decomposition,  Branch  and  Bound,  and 
the  Extended  Cutting  Plane  method. 

2.1.1  Outer  Approximation 

Outer  Approximation  was  first  introduced  for  a  class  of  MINLP  problems  by  Duran 
and  Grossman  [43]  and  then  extended  to  more  general  problems  by  Fletcher  and 
Leyffer  [46].  At  each  iteration,  an  upper  bound  is  obtained  by  solving  a  restricted 
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in  a  suitable  programming  language  (Matlab®),  and  demonstrate  its  effectiveness  on 
a  real  engineering  design  problem. 

1.4  Overview 

The  remainder  of  this  document  is  laid  out  as  follows.  Following  a  review  of  the 
appropriate  literature  in  Chapter  2,  Chapter  3  summarizes  important  definitions  and 
results  from  the  theory  of  positive  linear  dependence  [38]  and  the  Clarke  nonsmooth 
calculus  [31]  that  will  be  used  extensively  throughout  this  document.  Chapter  4  dis¬ 
cusses  in  greater  detail  the  basic  GPS  algorithm  for  NLP  problems  with  bound  and 
simple  linear  constraints  [8,  94,  95,  130],  and  the  filter  GPS  algorithm  for  general 
NLP  problems  [7].  Chapter  5  presents  the  Audet- Dennis  GPS  algorithm  for  solving 
bound  constrained  MVP  problems  [8],  but  with  new  convergence  results  (for  bound 
and  simple  linear  constraints),  in  which  smoothness  conditions  are  relaxed  and  made 
more  consistent  with  existing  GPS  theory.  Chapter  6  presents  the  new  class  of  GPS 
algorithm  for  MVP  problems  with  general  nonlinear  constraints,  along  with  its  theo¬ 
retical  convergence  properties.  The  algorithm  is  constructed  as  a  generalization  of  the 
algorithms  presented  in  Chapters  4  and  5.  Chapter  7  provides  an  overview  of  a  new 
Matlab  implementation  of  the  entire  class  of  GPS  algorithms  discussed  here,  and 
presents  results  when  applied  to  a  problem  in  the  design  of  a  load  bearing  thermal 
insulation  system.  Chapter  8,  which  primarily  contains  the  work  found  in  [2],  presents 
a  version  of  the  basic  GPS  algorithm  that  uses  any  available  derivative  information 
to  reduce  the  number  of  function  evaluations  for  each  GPS  iteration.  It  includes 
numerical  results  on  some  test  problems  from  the  CUTE  [18]  set.  Finally,  Chapter  9 
offers  concluding  remarks  and  several  recommendations  for  future  research. 
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NLP,  in  which  discrete  variables  are  held  constant.  Then,  if  the  problem  is  convex 
(■ i.e .,  the  objective  is  convex  and  the  constraints  form  a  convex  set),  a  lower  bound  is 
obtained  by  solving  a  mixed  integer  linear  program  (MILP),  in  which  linearizations  of 
the  objective  and  constraint  functions  (at  the  current  iterate)  are  cumulatively  added 
as  constraints  (i.e.,  a  constraint  added  in  one  iteration  is  kept  for  all  subsequent 
iterations).  At  each  successive  step,  the  lower  and  upper  bounds  approach  each 
other,  yielding  an  approximate  solution.  Variations  and  improvements  of  the  method 
have  been  proposed  by  Kocis  and  Grossman  [88],  Leyffer  [97],  Floudas  [50],  and 
Viswanathan  and  Grossman  [135]. 

2.1.2  Generalized  Benders  Decomposition 

Similar  to  outer  approximation,  Generalized  Benders  Decomposition,  developed  by 
Geoffrion  [56],  solves  the  same  NLP  subproblem,  but  a  different  MILP  subproblem, 
which  is  formed  by  linearizing  the  Lagrangian  function  L(x,  A)  =  f(x)  +  \TC(x) 
around  the  current  point,  where  A  £  W  is  the  vector  of  Lagrange  multipliers. 

2.1.3  Branch  and  Bound  Methods 

Branch  and  Bound  methods  were  introduced  by  Dakin  [36],  and  further  developed 
by  Gupta  and  Ravindran  [65],  Borchers  and  Mitchell  [22],  and  Leyffer  [98].  In  this 
class  of  methods,  if  a  continuous  NLP  relaxation  of  the  MINLP  does  not  produce  a 
feasible  solution  (i.e.,  integer  variables  take  on  integer  values),  then  the  NLP  solution 
is  treated  as  a  lower  bound,  and  a  binary  tree  search  is  performed  by  implicit  enu¬ 
meration,  where  a  subset  of  the  discrete  variables  is  fixed  at  each  node.  For  example, 
for  the  discrete  variable  a:  at  a  particular  node,  if  an  iteration  yields  x  =  3.5,  the  two 
branching  problems  emanating  from  that  node  would  typically  add  the  constraints, 
x  <  3  and  x  >  4,  respectively.  Several  efficiencies  are  added  to  prevent  testing  unnec- 
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essary  nodes  (fathoming).  Quesada  and  Grossman  [118]  propose  a  clever  LP/NLP 
based  branch  and  bound  method  for  a  subclass  of  MINLPs,  in  which  discrete  vari¬ 
ables  are  binary,  while  Leyffer  [98]  proposes  a  basic  sequential  quadratic  programming 
(SQP)  branch  and  bound  algorithm. 

2.1.4  Extended  Cutting  Plane  Method 

The  Extended  Cutting  Plane  method  was  introduced  by  Westerlund  and  Pettersson 
[136]  as  an  extension  of  Kelley’s  [84]  cutting  plane  algorithm  for  NLPs.  In  this  algo¬ 
rithm,  a  non-decreasing  sequence  of  lower  bounds  is  generated  by  solving  a  sequence  of 
MILPs,  in  which  each  MILP  adds  as  a  constraint  a  linearization  of  the  most  violated 
MINLP  constraint  evaluated  at  the  previous  suboptimal  point.  Termination  with 
a  solution  occurs  when  the  maximum  constraint  violation  falls  below  a  user-defined 
threshold. 

2.1.5  Suitability  of  MINLP  Methods 

All  of  the  current  methods  for  solving  MINLP  problems  have  drawbacks  that  preclude 
their  use  on  MVP  problems.  First,  objective  and  constraint  functions  are  often  not 
convex,  a  condition  on  which  convergence  results  for  these  methods  greatly  rely.  In 
fact,  while  heuristics  exist  to  alleviate  problems  with  non-convexities,  no  local  conver¬ 
gence  theory  has  been  been  developed  for  non-convex  problems.  Second,  methods  that 
linearize  objective,  constraint,  or  Lagrangian  functions  make  use  of  first-order  Taylor 
Series,  which  requires  differentiability  of  those  functions;  this  cannot  be  guaranteed 
here.  Some  of  these  methods  also  require  good  estimates  of  Lagrange  multipliers, 
which  can  be  problematic.  Third  (and  most  importantly),  all  of  these  methods  rely 
on  some  form  of  integer  relaxation,  which  cannot  be  applied  to  categorical  variables, 
since  the  functions  are  not  defined  outside  the  domain  of  the  variables.  For  example, 
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a  simulation  that  computes  function  values  may  only  allow  a  fixed  set  of  input  values 
for  each  categorical  variable  and  no  others. 

2.2  Search  Heuristics 

Search  heuristics  are  methods  designed  to  find  global  optima  without  using  derivative 
information  by  methodically  searching  the  solution  space.  Many  are  inspired  by  phys¬ 
ical  or  biological  processes,  but  lack  the  necessary  theory  to  guarantee  convergence  to 
a  solution  satisfying  first-order  optimality  conditions.  The  classes  of  search  heuristics 
most  relevant  to  solving  MVP  problems  are  simulated  annealing,  tabu  search,  and 
evolutionary  algorithms,  each  of  which  is  now  discussed. 

2.2.1  Simulated  Annealing 

Simulated  annealing  is  a  modified  Monte  Carlo  search  technique  originally  devised  by 
Metropolis  et  al.  [109],  as  an  approach  to  numerically  computing  equations  of  state 
and  other  properties  of  interacting  molecules.  In  an  annealing  process,  a  melt  of  a 
substance  begins  at  high  temperature  and  is  slowly  cooled  while  trying  to  maintain 
thermodynamic  equilibrium.  If  done  too  quickly,  or  if  the  starting  temperature  is  too 
low,  then  deformations  of  the  material  or  other  undesirable  things  may  occur,  leading 
to  a  less-than-optimal  result,  which  is  analogous  to  attainment  of  a  local  minimum, 
rather  than  a  global  minimum.  Metropolis  et  al.  attempted  to  minimize  the  energy 
E  :  3?”  -»  5?  of  a  system  as  it  is  cooled  to  its  final  temperature  T  =  0  (or  in  the  general 
case,  T  —  Tmm  for  some  minimum  temperature  Tmin),  essentially  by  the  algorithm 
shown  in  Figure  2.1  with  T  decremented  in  accordance  with  a  user-specified  cooling 
schedule. 

Simulated  annealing  was  formally  introduced  as  a  general  combinatorial  optimiza¬ 
tion  technique  by  Kirkpatrick,  Gellatt,  and  Vecchi  [87],  and  by  Cerny  [29]  indepen- 
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Simulated  Annealing  Algorithm 

Choose  initial  point  x  G  5?”  at  random 
While  ( T  >  Tmin)  do 

•  Randomly  generate  a  neighbor  y  of  x 

•  If  ( E(y )  <  E(x)),  then 

-  Set  x  =  y 
Else 

-  Set  x  =  y  with  probability  e~  E(v)tB(x) 

—  Decrease  T  slightly  (according  to  the  cooling  schedule) 

End _ 

Figure  2.1  A  Simulated  Annealing  Algorithm 

dently.  Much  of  the  theory  for  combinatorial  optimization  problems  relies  on  the 
asymptotic  behavior  of  Markov  chains  (e.g.,  see  [55],  [57],  [102],  [111],  and  [115]). 
Hajek  [66]  is  generally  given  credit  for  first  establishing  necessary  and  sufficient  con¬ 
ditions  for  proving  that  the  algorithm  converges  to  a  global  minimum  in  probability. 
Specifically,  the  cooling  schedule  must  be  deterministic  and  decrease  to  zero,  and  the 
state  space  must  be  finite. 

In  a  letter  to  the  editor  of  Operations  Research ,  Pincus  [117]  first  suggested  the 
application  of  Metropolis’  work  to  the  minimization  of  real  continuous  functions  over 
a  compact  set  in  Actual  adaptations  of  the  simulated  annealing  algorithm  for 
continuous  variable  problems  have  been  proposed  by  many  authors  [17,  127,  134]. 
For  example,  Bilbro  and  Snyder  [17]  introduce  what  they  term  tree  annealing ,  in 
which  they  add  a  tree  structure  to  the  the  original  Metropolis  scheme  to  handle  the 
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continuous  variables.  Szu  and  Hartley  [127]  proved  convergence  in  probability  to  a 
global  minimum  for  a  specific  choice  of  cooling  schedule,  provided  that  the  search 
directions  are  generated  according  to  a  Cauchy  distribution.  In  what  remains  an 
active  area  of  research,  convergence  proofs  for  other  classes  of  annealing  algorithms, 
having  various  assumptions  on  cooling  schedules  and  probability  distributions  can  be 
found  in  [16],  [54],  [100]  and  [101],  among  many  others.  An  excellent  overview  of 
simulated  annealing  and  description  of  its  convergence  properties  is  found  in  [131]. 

Although  convergence  in  probability  to  the  global  optimum  is  certainly  a  desirable 
result,  computational  studies  show  that  the  rate  of  convergence  is  often  slow  [79]. 
A  theoretical  reasoning  for  this  can  be  found  in  [102],  where  convergence  on  some 
combinatorial  optimization  problems  is  shown  to  be  exponentially  long.  Furthermore, 
Bilbro  and  Snyder  [17]  and  Rutenbar  [121]  identify  problems  for  which  simulated 
annealing  problems  are  unable  to  find  the  global  minimum  in  finite  time. 

Ingber  [78]  offers  some  algorithmic  improvements  without  sacrificing  theoretical 
convergence.  His  implementation  shows  favorable  performance  results  compared  to 
that  of  the  traditional  simulated  annealing  [79]  and  genetic  algorithms  [80]. 

2.2.2  Tabu  Search 

The  tabu  search  was  formally  introduced  by  Glover  [59,  60,  61]  as  a  metaheuristic 
for  solving  combinatorial  optimization  problems,  although  several  of  its  ideas  were 
developed  independently  by  Hansen  [67].  The  basic  idea  is  that,  in  searching  for  a 
better  point  among  its  discrete  neighbors  (defined  by  the  user),  the  algorithm  may 
accept  a  worse  point  if  no  better  ones  are  found,  and  if  the  candidate  point  is  not 
already  on  a  list  of  forbidden  points  (called  a  tabu  list).  Thus,  if  a  local  optimum 
is  found,  it  is  added  to  the  tabu  list,  and  the  algorithm  moves  to  another  point  in 
an  area  of  the  domain  that  has  not  been  searched  yet.  The  tabu  list  is  designed  to 
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prevent  the  algorithm  from  returning  back  to  local  minima.  Key  to  this  approach 
is  the  management  of  the  tabu  list,  and  the  strategies  for  three  concepts,  known  as 
aspiration ,  diversification ,  and  intensification.  Aspiration  refers  to  a  function  and 
conditions  by  which  the  tabu  list  is  overridden  to  include  good  points.  Diversification 
and  intensification  refer  to  strategies  for  searching  globally  or  locally,  respectively. 
Variations  of  tabu  search  differ  primarily  in  these  three  areas,  as  well  as  the  data 
structures  used  for  managing  the  tabu  list. 

A  simple  tabu  search  procedure  is  given  in  Figure  2.2,  where  we  define  the  set 
N(x)  as  the  discrete  set  of  feasible  neighbors  at  x,  the  set  T  as  the  tabu  list,  fcmax 
as  the  maximum  number  of  iterations,  and  L  as  the  maximum  number  of  iterations 
since  the  incumbent  solution  x*  was  last  updated. 

Tabu  Search  Algorithm 

Select  an  initial  x  €  0  and  let  x*  =  x,  T  =  0 
For  k  =  1,2,... 

•  If  (AT(z)\7V0),  then 

-  Set  s*  €  argmin{/(s)  :  s  €  N(x)  \  T} 

-  Set  x*  e  argmin {f(s*)J(x*)} 

•  If  (k  >  kmax  or  k  >  L  or  N(x)  \  T  =  0),  STOP 

•  Update  T 

Figure  2.2  A  Tabu  Search  Algorithm 

Glover  [62]  offers  ideas  for  extending  the  tabu  search  heuristic  to  nonlinear  and 
parametric  optimization  problems.  He  points  out  that  the  tabu  list  must  incorporate 
some  notion  of  distance  between  the  current  point  and  the  points  in  the  list.  The 
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ideas  he  proposes  mainly  involve  directional  search  strategies  and  scatter  search. 
Although  tabu  search  is  widely  used  and  has  performed  well  on  many  combinatorial 
optimization  problems  in  practice,  there  is  no  formal  convergence  theory  to  guarantee 
its  success. 

2.2.3  Evolutionary  Algorithms 

Evolutionary  algorithms  encompass  a  large  family  of  numerical  methods  that  model 
the  evolutionary  processes  seen  in  biology.  Among  these  are  three  classes  of  algorithms 
that  have  been  applied  successfully  to  optimization  problems;  namely,  Evolution 
Strategies,  Evolutionary  Programming,  and  Genetic  Algorithms.  The  latter,  which 
are  the  most  well-known,  have  been  primarily  used  on  discrete  optimization  prob¬ 
lems,  while  the  other  two  were  designed  primarily  for  continuous  problems.  Genetic 
algorithms  have  been  applied  to  continuous  problems  as  well,  but  to  do  so  generally 
requires  a  discrete  encoding  of  the  problem. 

Rather  than  dealing  with  one  iterate  at  a  time,  evolutionary  algorithms  deal  si¬ 
multaneously  with  finitely  many  points,  called  a  population.  Each  point  or  individual 
in  the  population  is  evaluated  with  respect  to  a  fitness  function,  which  is  analogous  to 
an  objective  function.  Each  generation  then  reproduces  when  a  subset  of  individuals 
in  the  population,  called  parents ,  are  selected  based  on  some  specified  criteria,  and 
various  search  operators  are  applied  to  the  parents  to  create  new  individuals,  called 
offspring.  Typically,  the  selection  and  reproduction  processes  ensure  that  offspring, 
on  average  ( i.e .,  expected  value),  will  have  better  fitness  values  than  the  parents; 
however,  this  is  not  guaranteed,  due  to  the  randomness  inherent  in  the  reproduction 
process. 

The  search  operators  are  typically  given  biological  names,  the  most  common  of 
which  are  selection,  reproduction  (often  called  recombination  or  crossover),  muta- 
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tion,  and  competition  [122],  They  are  the  tools  by  which  new  candidate  points  are 
generated  in  the  optimization  process.  These  can  be  described  briefly  as  follows: 

•  Selection:  Process  by  which  parents  are  selected  for  reproduction. 

•  Reproduction:  Process  by  which  genes  are  passed  from  parents  to  offspring. 

•  Mutation:  Random  errors  occurring  in  the  reproduction  process. 

•  Competition:  Process  by  which  offspring  survive  to  the  next  generation  when 
there  are  limited  resources. 

Given  these  definitions,  it  is  helpful  to  point  out  a  few  examples.  Suppose  par¬ 
ents  x  =  (xi,  X2,  ■  ■  ■ , xn)  and  y  =  (yi,y2,  ■  ■  ■ ,  y„)  have  been  selected.  One  common 
recombination  operator  produces  two  complementary  offspring  2,  w  €  3?",  in  which, 
z,  =  xitWi  =  yi  with  probability  p{;  otherwise,  2*  =  yuwx  =  xx  [138],  Another 
common  operator  computes  some  linear  combination  of  the  parents’  characteristics 
for  each  component.  Mutation  operators  may  involve  adding  some  random  values  to 
components  of  a  member  of  the  population.  The  random  values  are  typically  statis¬ 
tically  independent  and  come  from  some  pre-defined  probability  distribution,  most 
commonly  Gaussian  or  Cauchy. 

A  general  framework  [138]  for  Evolutionary  Algorithms  is  given  in  Figure  2.3,  the 
details  of  which  will  define  the  various  classes  of  algorithms. 

Genetic  Algorithms 

Genetic  Algorithms  were  developed  by  Holland  [74,  75]  as  a  model  of  Darwinian  [37] 
theory  of  genetic  evolution,  where  members  of  a  population  with  higher  fitness  values 
are  given  a  higher  probability  of  reproducing.  In  most  implementations,  variables 
are  encoded  as  binary  strings,  and  the  typical  approach  is  to  include  the  four  search 
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Evolutionary  Algorithm 

.  Generate  the  initial  population  P( 0)  at  random 
For  i  =  0, 1, 2, . . . 

•  Evaluate  the  fitness  of  each  individual  in  P(i) 

•  Select  parents  from  P(i) 

•  Apply  search  operators  to  the  parents  and  produce  generation  P(i  +  1) 

•  Check  for  convergence  or  termination  condition _ 

Figure  2.3  An  Evolutionary  Algorithm 

operators  in  the  genetic  process.  Genetic  algorithms  were  first  applied  to  optimization 
by  De  Jong  [39],  a  student  of  Holland. 

Key  to  the  algorithm  is  the  choice  of  fitness  function.  This  may  be  the  same  as 
the  objective  function  of  the  problem  being  optimized,  but  often  is  not,  in  an  effort 
to  increase  efficiency.  However,  the  extrema  of  the  two  functions  must  match  [122]. 

Selection  of  parents  for  reproduction  is  usually  done  randomly,  with  increased 
probability  of  selection  for  individuals  with  better  fitness  values.  The  most  common 
approaches  select  parents  based  on  proportional  values  of  the  fitness  functions  [63], 
ordinal  rankings  of  individuals  by  fitness  value  [11],  or  a  combination  of  the  two  [63, 
110]. 

In  genetic  algorithms,  since  the  individuals  are  usually  represented  by  binary 
strings,  the  crossover  (reproduction)  operator  involves  sampling  bits  from  the  two 
parents.  One  of  the  most  common  schemes  generates  two  children  by  copying  the 
parents  and  then  swapping  (proper)  substrings  of  the  binary  encoding.  Mutation 
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typically  occurs  with  low  probability  and  usually  involves  resetting  a  randomly  se¬ 
lected  bit  to  its  complement. 

A  brief  survey  of  convergence  theory  results  for  genetic  algorithms  can  be  found 
in  [110].  A  Markov  chain  analysis  of  genetic  algorithms  with  finite  populations  is 
given  in  [64],  but  only  considers  reproduction  and  mutation  operators.  Kingdon  [86] 
studied  and  attempted  to  characterize  problems  that  genetic  algorithms  have  difficulty 
solving,  and  provides  some  convergence  results.  Rudolph  [120]  proved  that  the  only 
genetic  algorithms  that  converge  to  a  global  optimum  are  those  that  always  maintain 
the  best  solution  in  the  population.  Michalewicz  [110]  introduced  a  new  class  of 
so-called  contractive  mapping  genetic  algorithms,  which,  under  certain  conditions, 
converge  (not  just  in  probability)  based  on  Banach’s  well-known  contraction  mapping 
theorem  [12].  A  drawback  of  this  approach  is  that  the  contraction  mapping  requires 
an  improvement  of  the  entire  population’s  average  fitness  at  each  iteration.  If  an  an 
improved  population  is  not  found,  then  more  iterations  are  performed  until  one  is 
found.  As  the  algorithm  progresses,  this  becomes  less  and  less  likely,  which  slows 
convergence  considerably. 

In  practice,  genetic  algorithms  have  additional  drawbacks  that  make  them  unsuit¬ 
able  for  solving  the  type  of  MVP  problems  investigated  here.  For  one,  while  they  are 
generally  good  at  finding  improved  designs  quickly,  they  are  not  reliable  at  finding 
global  optima  (which  is  their  goal),  and  when  they  do  converge  to  a  local  optima, 
it  is  often  at  a  slower  rate  than  traditional  gradient  or  so-called  hill-climbing  meth¬ 
ods  [52,  128].  Furthermore,  because  of  their  very  nature,  the  requirement  to  generate 
new  populations  rather  than  single  iterates  makes  them  even  more  costly  with  respect 
to  the  number  of  function  evaluations. 
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Evolutionary  Strategies 

Rechenberg  and  Schwefel  first  proposed  evolution  strategies  in  1965  as  a  numerical 
optimization  technique,  although  not  in  its  current  form  [124]. 

In  evolution  strategies,  each  individual  is  represented  by  a  pair  of  real-valued 
vectors  (x,  a),  where  x  is  its  position  in  the  search  space,  and  a  is  a  vector  of  standard 
deviations.  The  mutation  operator  is  typically  performed  by  xk+l  =  xk  +  N(0,a), 
where  jV(0,  a)  is  a  vector  of  independent  normally  distributed  random  numbers  with 
mean  0  and  standard  deviation  a.  Since  the  selection  of  a  is  highly  dependent  on 
the  problem  and  its  dimensionality,  Schwefel  [123]  proposed  a  concept  referred  to  as 
self- adaptation,  in  which  the  mutation  operator  is  applied  to  a  as  well  as  x. 

The  two  primary  selection  operators  are  referred  to  as  (A  +  p)  and  (A,  p),  where  p 
represents  the  number  of  parents  and  A  represents  the  number  of  offspring.  After  the 
A  offspring  are  generated,  the  p  fittest  individuals  are  selected,  either  from  the  total 
p+\  candidates  or  only  the  A  offspring,  depending  on  the  respectively  chosen  operator. 
Offspring  that  violate  any  constraint  of  the  optimization  problem  are  discarded  in 
favor  of  the  parents  (which  are  feasible). 

The  two  primary  recombination  operators  are  exactly  those  mentioned  earlier; 
namely,  given  parents  x  and  y,  each  component  of  one  offspring  is  randomly  chosen 
from  the  corresponding  components  of  the  two  parents,  while  the  second  offspring  is  its 
complement;  or  an  offspring’s  components  are  generated  as  some  linear  combination 
of  the  parents’  components. 

Convergence  in  probability  can  be  proved  for  optimization  problems  with  mild 
assumptions  (including  continuity  of  the  objective  function),  provided  that  the  com¬ 
ponents  of  a  are  identical  and  positive  [9].  However,  the  question  as  to  the  convergence 
rate  of  evolution  strategies  remain  open. 
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Evolutionary  Programming 

Evolutionary  Programming  was  first  introduced  as  an  artificial  intelligence  tech¬ 
nique  applied  to  finite  state  machines  by  Fogel  [51],  and  it  was  later  extended  by 
Burgin  [26,  27]  and  Atmar  [3].  Though  developed  under  entirely  different  circum¬ 
stances,  it  is  actually  quite  similar  to  evolution  strategies.  For  example,  each  individ¬ 
ual  is  represented  by  a  pair  of  vectors  (x,a)  in  precisely  the  same  way  as  evolution 
strategies.  The  mutation  operator  is  also  essentially  identical;  however,  evolutionary 
programming  has  no  recombination  operator  [10].  Other  than  that,  the  primary  dif¬ 
ferences  between  the  two  approaches  involve  the  selection  and  competition  processes. 

Rather  than  selecting  the  best  offspring,  evolutionary  programming  uses  a  tourna¬ 
ment  approach.  Once  the  /.i  offspring  have  been  generated,  yielding  A  +  //.  individuals, 
a  subset  of  the  individuals  are  selected  uniformly  at  random  as  opponents  for  each 
individual.  Then  each  individual  receives  a  score  corresponding  to  the  number  of 
opponents  with  a  worse  fitness  value.  The  next  generation’s  parents  are  chosen  as 
the  fi  individuals  with  the  highest  scores. 

2.2.4  Suitability  of  Search  Heuristics 

The  search  heuristics  presented  here  are  all  useful  methods  in  optimization  when 
the  goal  is  to  improve  a  design  or  find  a  better  point.  In  particular,  they  have  few 
restrictions  on  the  types  of  problems  to  which  they  can  be  applied.  However,  though 
the  goal  in  many  cases  is  to  find  a  global  optimum,  adequate  theory  to  do  so  is 
generally  absent. 

Tabu  search  is  an  intriguing  idea  for  avoiding  local  optima,  which  has  performed 
well  in  a  variety  of  problems,  but  it  is  geared  mainly  toward  discrete  optimization,  and 
there  are  simply  no  convergence  guarantees.  Simulated  annealing,  under  a  specific 
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set  of  assumptions,  converges  in  probability  to  a  global  optimum,  but,  in  practice, 
is  often  much  slower  than  traditional  methods,  due  to  the  probabilistic  nature  of 
the  algorithm.  Evolutionary  algorithms  offer  little  by  way  of  theoretical  convergence 
guarantees  without  becoming  overly  problem-dependent,  except  in  one  particular 
case  of  evolutionary  strategy.  However,  in  practice,  the  convergence  rate  is  often 
slow  (similar  to  simulated  annealing),  and  early  movement  toward  a  local  optimum  is 
common.  This  inefficiency  is  magnified  as  the  dimension  of  the  problem  is  increased. 

2.3  Generalized  Pattern  Search  (GPS)  Methods 

Pattern  search  methods  represent  a  subclass  of  direct  search  algorithms,  in  which  the 
minimizer  of  a  continuous  function  is  sought  without  the  use  of  derivatives.  Among 
the  first  direct  search  algorithms  were  the  well-known  method  of  Hooke  and  Jeeves  [76] 
and  the  simplex  algorithm  of  Nelder  and  Mead  [113].  At  the  time,  these  were  consid¬ 
ered  heuristics  with  no  formal  convergence  theory. 

2.3.1  Lewis  and  Torczon 

In  an  award-winning  1997  paper,  Torczon  [130]  introduced  the  class  of  generalized 
pattern  search  (GPS)  methods  for  solving  unconstrained  NLP  problems,  and  showed 
that  it  includes  coordinate  search  with  fixed  step  sizes,  evolutionary  operation  using 
factorial  design  [23],  Hooke  and  Jeeves’  algorithm  [76],  and  the  multidirectional  search 
algorithm  [42].  Without  ever  computing  or  approximating  derivatives,  Torczon  [130] 
also  show  that  if  all  iterates  lie  in  a  compact  set  and  the  objective  function  /  is 
continuously  differentiable  in  a  neighborhood  of  the  level  set  (r  e  5R"  :  fix)  <  f(x  0)}, 
where  x0  €  3?”  is  the  initial  iterate,  then  a  subsequence  of  GPS  iterates  {xfc}  converges 
to  a  point  x  satisfying  Vf(x)  =  0. 


At  each  iteration  of  Torczon’s  algorithm,  the  objective  function  is  evaluated  on  a 
finite  set  of  neighboring  points  on  a  carefully  constructed  discrete  mesh,  formed  by 
considering  nonnegative  integer  combinations  of  vectors  that  form  a  positive  span¬ 
ning  set  [93]  (see  Definition  3.2).  If  an  improvement  is  found,  then  the  new  iterate  is 
accepted  and  the  mesh  is  retained  or  coarsened;  otherwise,  the  mesh  is  refined  and 
a  new  set  of  neighboring  mesh  points  is  evaluated.  Torczon  shows  convergence  by 
showing  that  the  mesh  size  gets  arbitrarily  small.  In  [93],  Lewis  and  Torczon  general¬ 
ize  their  algorithm  by  applying  the  theory  of  positive  linear  dependence  of  Davis  [38] 
to  reduce  the  worst  case  number  of  trial  points  at  each  iteration.  They  also  introduce 
a  heuristic,  called  rank  ordering,  in  which,  after  evaluating  /  in  directions  forming  a 
basis,  they  generate  a  new  direction,  based  on  the  difference  between  the  best  and 
worst  directions  of  the  basis,  with  the  expectation  that,  for  a  sufficiently  fine  mesh, 
this  new  direction  will  provide  a  crude  estimate  for  the  direction  of  steepest  descent  . 
Similar  ideas,  but  in  a  slightly  different  context,  can  be  found  in  [34]. 

Lewis  and  Torczon  have  extended  the  GPS  method  to  solve  both  bound  [94]  and 
linearly  constrained  problems  [95].  In  doing  so,  they  showed  that  by  choosing  the 
search  directions  appropriately,  and  if  the  objective  function  /  is  continuously  differ¬ 
entiable  in  a  neighborhood  of  the  level  set  {x  e  :  f(x)  <  /(x0)},  the  algorithm  is 
guaranteed  to  produce  a  subsequence  of  iterates  converging  to  a  limit  point  x  satisfy¬ 
ing  V/(x)T(x  -  x)  >  0  for  any  feasible  x.  Audet  [4]  provides  several  clever  examples 
to  show  that  many  of  Torczon’s  theoretical  results  cannot  be  relaxed. 

For  NLP  problems  with  nonlinear  constraints,  Lewis  and  Torczon  [96]  developed 
a  derivative- free  augmented  Lagrangian  version  of  GPS.  The  augmented  Lagrangian 
they  use,  which  comes  from  Conn,  Gould,  and  Toint  [32],  is  given  by 
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where  A  =  (Ai, . . . ,  \P)T  is  the  vector  of  Lagrange  multiplier  estimates  for  the  equality 
constraints  Ci(x),i  =  1,2, ...  ,p  (inequality  constraints  are  assumed  to  have  had  slack 
variables  added,  as  appropriate),  p  is  a  penalty  parameter,  and  the  entries  Su  of  the 
diagonal  matrix  S  are  scaling  factors  for  the  constraints.  Thus,  in  this  formulation,  the 
nonlinear  (and  linear)  constraints  are  incorporated  into  the  augmented  Lagrangian, 
while  the  simple  bound  constraints  remain  as  explicit  constraints.  The  GPS  algorithm 
described  in  [94]  is  then  applied  to  the  resulting  bound  constrained  subproblem  in 
an  iterative  fashion.  However,  each  subproblem  (denoted  by  j)  is  only  solved  until 
the  mesh  size  parameter  satisfies  A*,  <  5j,  where  5j  — >  0  and  ho  <C  1.  The  Lagrange 
multiplier  estimates  are  updated  by  the  same  Hestenes-Powell  formula  as  in  [32]; 
namely, 

\(x,\,S,p)  =  \+-SC(x),  (2.2) 

V- 

which  requires  no  derivative  information. 

Lewis  and  Torczon  show  that  this  construction,  under  the  same  assumptions  as 
in  [32],  plus  a  mild  restriction  on  search  directions,  yields  an  algorithm  that  converges 
to  a  KKT  first-order  stationary  point.  The  main  drawback  is  that  no  strategy  for 
reducing  the  penalty  parameter  is  given,  and  no  numerical  results  are  provided.  Thus, 
the  efficiency  of  the  algorithm  is,  for  the  most  part,  unknown. 

An  asynchronous  parallel  version  of  the  GPS  algorithm  has  also  been  developed 
by  Hough,  Kolda  and  Torczon  [77,  90,  91]. 

2.3.2  Audet  and  Dennis 

Audet  and  Dennis  [6]  present  a  hierarchy  of  convergence  results  for  a  slightly  modi¬ 
fied,  but  equivalent  version  of  GPS  for  bound  and  linearly  constraints,  in  which  the 
strength  of  the  results  depends  on  local  continuity  and  smoothness  conditions  of  the 
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objective  function.  By  so  doing,  they  establish  Torczon’s  work  as  a  corollary  of  theirs 
with  much  shorter  and  simpler  proofs.  This  hierarchy  of  results  is  described  in  detail 
in  Chapter  4. 

Two  additional  GPS  papers,  which  are  discussed  in  detail  in  Chapters  4  and  5, 
are  extremely  relevant  to,  and  in  fact,  form  the  foundation  of  this  work.  In  the 
first  paper  [7],  Audet  and  Dennis  implement  a  filter  method  into  the  GPS  frame¬ 
work  to  handle  general  nonlinear  constraints.  Originally  introduced  by  Fletcher  and 
Leyffer  [47]  to  conveniently  globalize  sequential  quadratic  programming  (SQP)  and  se¬ 
quential  linear  programming  (SLP),  filter  methods  accept  steps  if  either  the  objective 
function  or  an  aggregate  constraint  violation  function  is  reduced.  Fletcher,  Leyffer, 
and  Toint  [48]  show  convergence  of  the  SLP-based  approach  to  a  limit  point  satisfying 
Fritz  John  [82]  optimality  conditions;  they  show  convergence  of  the  SQP  approach  to 
a  KKT  point  [49],  provided  a  constraint  qualification  is  satisfied.  However,  in  both 
cases,  more  than  a  simple  decrease  in  the  function  values  is  required  for  convergence 
with  these  properties.  Audet  and  Dennis  show  convergence  to  limit  points  having 
almost  the  same  characterization  as  in  [6],  but  with  only  a  simple  decrease  in  the 
objective  or  constraint  violation  function  required.  While  they  are  unable  to  show 
convergence  to  a  point  satisfying  KKT  optimality  conditions  (and,  in  fact,  have  coun¬ 
terexamples  [7]),  in  that  -Vf(x)  does  not  necessarily  belong  to  the  normal  cone,  they 
are  able  to  show  that  —  V/(x)  belongs  to  a  larger  cone  containing  the  normal  cone, 
the  size  of  which  depends  on  the  choice  of  directions  used.  A  richer  set  of  directions, 
although  more  costly,  spans  more  of  the  tangent  cone;  thus,  its  polar  shrinks,  so  as 
to  increase  the  likelihood  of  achieving  a  KKT  point. 

In  the  second  paper  [8],  Audet  and  Dennis  introduce  a  GPS  method  to  solve  bound 
constrained  MVP  problems  under  the  assumption  of  continuous  differentiability  of 
the  objective  function  on  the  neighborhood  of  a  level  set  in  which  all  iterates  lie. 
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This  algorithm  is  a  generalization  of  the  basic  GPS  algorithm  in  that  it  reduces  to 
basic  method  in  the  absence  of  discrete  variables.  The  success  of  the  method  is 
demonstrated  in  [89]  on  a  problem  in  the  design  of  thermal  insulation  systems,  an 
expanded  version  of  which  is  discussed  in  detail  in  Chapter  7. 

A  key  point  to  the  Audet-Dennis  GPS  algorithms  is  that  they  explicitly  separate 
out  a  SEARCH  step  from  the  main  POLL  step  within  the  iteration,  in  which  any  finite 
search  strategy  (including  none)  on  the  mesh  may  be  employed,  without  adversely 
affecting  the  convergence  theory.  This  flexibility  lends  itself  quite  easily  to  hybrid 
algorithms  and  enables  the  user  to  apply  specialized  knowledge  of  the  problem.  One 
typical  implementation  for  difficult  and  computationally  expensive  problems  is  to  op¬ 
timize  a  significantly  less  costly  surrogate  function  during  the  SEARCH  step  of  each 
iteration,  map  the  resulting  point  to  a  nearby  mesh  point,  and  compute  its  true  func¬ 
tion  value  there  [21].  While  the  SEARCH  step  contributes  nothing  to  the  convergence 
theory  of  GPS  (and  in  fact,  an  unsuitable  SEARCH  may  impede  performance),  the 
use  of  surrogates  enables  the  user  to  gain  significant  improvement  early  on  in  the 
iteration  process  at  much  lower  cost. 

Because  problems  with  very  expensive  function  evaluations  are  in  the  target  class 
of  problems  we  wish  to  solve,  a  discussion  of  the  use  of  surrogates  is  warranted.  Given 
an  optimization  problem  in  the  form  of  (1.1),  surrogate  functions  s/  :  X  — ►  3?  and 
sc  :  X  — ►  are  constructed  based  either  on  simplified  physics,  or  approximations 
obtained  by  evaluating  /  and  C  at  selected  points,  called  data  sites ,  aq, . . .  ,xr,  and 
interpolating  or  smoothing  the  function  values.  In  the  SEARCH  step  of  the  GPS 
algorithm,  the  surrogate  optimization  problem, 


min 

x£X 


Sf(x) 


s.  t.  Sr(j)  <  0, 


(2.3) 
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is  solved  approximately  using  any  desired  method.  It  is  not  important  to  obtain  a 
high  level  of  accuracy,  unless  the  surrogates  are  accurate  approximations  of  the  true 
functions.  The  surrogate  solver  may  return  multiple  points  xs,  which  are  then  mapped 
to  the  mesh.  The  values  f(xs)  and  C(xa)  are  then  computed  with  the  hope  of  finding 
an  improved  feasible  design  for  the  original  problem.  If  no  improvement  is  found, 
then  the  GPS  POLL  step  is  invoked,  and  more  points  are  evaluated  with  respect  to 
the  original  problem.  The  surrogate  can  then  be  recalibrated  using  the  new  points 
that  have  been  evaluated.  The  literature  contains  several  reasonable  approaches  for 
handling  the  recalibration  process  [25,  58,  132]. 

The  building  of  surrogates  often  involves  the  use  of  surfaces ,  which  are  functions 
designed  to  fit  or  smooth  data  chosen  at  the  data  sites.  Examples  include  kriging, 
responses  surfaces,  polynomial  interpolants,  and  neural  networks.  Data  site  selection 
is  often  done  using  Latin  hypercube  sampling  (LHS)  [107,  126],  orthogonal  arrays 
(OA)  [116],  OA-basecl  LHS  [129],  or  other  probability-based  space  filling  strategy,  - 
or  alternatively,  by  experimental  design,  with  the  goal  of  obtaining  a  reasonable  and 
rich  sampling  of  the  domain. 

Although  the  relationship  between  surrogates  and  surfaces  is  direct,  the  two  are 
not  necessarily  the  same.  Surfaces  may  be  constructed  based  on  the  difference  between 
the  original  and  surrogate  functions,  or  on  the  quotient  of  the  original  and  surrogate 
functions,  in  which  case,  the  surrogate  would  be  constructed  as  the  sum  or  product, 
respectively,  of  the  surface  and  surrogate  functions.  One  very  interesting  and  new 
approach  is  space  mapping  [40],  in  which  the  relationship  between  an  expensive  fine 
model  and  an  inexpensive  coarse  model  is  explored.  The  fine  and  coarse  models  are 
analogous  to  the  original  and  surrogate  models,  respectively. 


28 


2.4  Other  Relevant  Methods 

A  few  other  papers  in  related  areas  are  also  of  interest.  First,  a  recently  published 
paper  by  Coope  and  Price  [35]  introduces  what  they  term  a  grid-based  method  for 
unconstrained  NLP  problems  with  continuously  differentiable  objective  functions.  It 
is  similar  to  that  of  Torczon  [130],  but  with  an  alternative,  more  flexible  construction 
of  the  mesh  (or  grid).  They  show  convergence  to  a  first-order  stationary  point,  but 
must  rely  on  the  assumption  that  the  mesh  size  converges  to  zero,  which  Torczon  is 
able  to  prove  because  of  the  stricter  mesh  construction.  While  this  method  does  not 
quite  fall  under  the  GPS  umbrella,  the  flexibility  of  their  grid  construction  allows 
them  to  extend  the  algorithm  by  constructing  the  grid  with  conjugate  directions, 
thereby  achieving  the  classical  result  of  finite  termination  on  strictly  convex  quadratic 
functions  [33]. 

Second,  Biinner,  Schittkowski,  and  van  de  Braak  [24]  solve  problems  in  the  design 
of  surface  acoustic  wave  filters  by  applying  an  SQP  method,  that  has  been  extended  to 
handle  non-relaxable  integer  variables  by  adding  a  direct  search  component.  However, 
the  algorithm  treats  the  direct  search  method  as  a  heuristic,  and  has  no  proven 
convergence  theory. 

Finally,  Hart  [68,  69,  70]  introduces  an  evolutionary  pattern  search  algorithm  that 
combines  the  ideas  of  evolutionary  algorithms  with  the  Torczon  [130]  GPS  algorithm 
in  a  way  that  allows  him  to  prove  probabilistic  analogs  of  Torczon’s  convergence 
results. 

2.5  Summary 

In  this  chapter,  the  main  ideas  of  MINLP  methods,  search  heuristics,  and  GPS  meth¬ 
ods  have  been  discussed.  MINLP  methods  are  not  suitable  for  solving  MVP  problems 
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because  all  current  methods  would  require  some  form  of  relaxation  of  the  integral¬ 
ity  of  the  categorical  variables,  which  is  not  possible.  Search  heuristics  are  also  not 
suitable  because  of  the  general  lack  of  sufficient  convergence  theory  and  the  large 
number  of  function  evaluations  typically  required.  However,  they  can  certainly  be 
used  within  the  SEARCH  step  of  a  GPS  algorithm,  since  optimizing  a  surrogate  func¬ 
tion  is  generally  inexpensive,  and  a  high  degree  of  accuracy  is  generally  not  required. 
Finally,  of  the  methods  currently  available,  GPS  algorithms  seem  best  suited  to  solve 
nonlinearly  constrained  MVP  problems,  due  to  the  combination  of  flexibility  and  con¬ 
vergence  theory  these  methods  possess,  and  because  of  already  existing  GPS  methods 
to  solve  both  NLPs  and  bound  constrained  MVPs.  In  fact,  the  main  focus  of  this  work 
is  to  integrate  the  filter  method  used  in  [7]  into  the  GPS  algorithm  for  MVP  prob¬ 
lems  described  in  [8]  so  that  MVP  problems  with  nonlinear  constraints  and  weaker 
smoothness  conditions  can  be  solved. 
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Chapter  3 

Positive  Basis  Theory  and  Clarke  Calculus 

Lewis  and  Torczon  [93]  show  that  convergence  of  GPS  algorithms  relies  on  the  theory 
of  positive  linear  dependence  of  Davis  [38] .  Audet  and  Dennis  [6]  show  that  the  non¬ 
smooth  calculus  of  Clarke  [31]  also  plays  a  vital  role  in  further  characterizing  GPS 
limit  points.  The  purpose  of  this  chapter  is  to  provide  some  basic  definitions  and 
theoretical  results  from  both  of  these  areas  that  will  be  used  throughout  the  chapters 
that  follow.  The  final  section  contains  some  new,  but  straightforward  results  that 
will  be  helpful  in  some  of  the  GPS  convergence  proofs  in  later  chapters. 

3.1  Positive  Spanning  Sets 

The  following  terminology  is  due  to  Davis  [38] . 

Definition  3.1  A  positive  combination  of  the  set  of  vectors  V  =  {vi}ri=zl 

r 

is  a  linear  combination  ^ aiVi}  where  oci  >  0,  i  =  1,2, ...  ,r. 

i—  1 

Definition  3.2  A  finite  set  of  vectors  W  =  {u>iK=1  forms  a  positive 
spanning  set  for  5ft",  if  every  u  £  5ft"  can  be  expressed  as  a  positive  com¬ 
bination  of  vectors  in  W.  The  set  of  vectors  W  is  said  to  positively  span 
5ft"  .  The  set  W  is  said  to  be  a  positive  basis  for  5ft"  if  no  proper  subset  of 
W  positively  spans  5ft". 

Davis  [38]  shows  that  the  cardinality  of  any  positive  basis  in  5ft"  is  between  n  +  1 
and  2 n.  Common  choices  for  positive  bases  include  the  columns  of  [I,  —I]  and  [I,  —  e], 
where  /  is  the  identity  matrix  and  e  the  vector  of  ones.  In  fact,  for  any  basis  V  £  Rnxn, 
the  columns  of  [1 V ,  -  V ]  and  [V,  - Ve ]  are  easily  shown  to  be  positive  bases. 
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The  following  key  theorem,  which  is  the  motivation  behind  the  use  of  positive 
spanning  sets  in  GPS  algorithms  [93],  is  also  due  to  Davis  [38]. 

Theorem  3.3  A  set  D  positively  spans  if  and  only  if,  for  all  nonzero 
v  G  vTd  >  0  for  some  d  G  D. 

Proof.  (Davis  [38])  Suppose  D  positively  spans  5Rn.  Then  for  arbitrary  nonzero 
v  G  5ft”,  v  =  otidi  for  some  a,  >  0,  dt  G  D,  i  =  1, 2, . . . ,  \D\.  Then  0  <  vTv  = 
El2  v>  at  ^east  one  term  °f  which  must  be  positive.  Conversely,  suppose  D  does 
not  positively  span  i.e.,  suppose  that  the  set  D  spans  only  a  convex  cone  that  is 
a  proper  subset  of  3?n.  Then  there  exists  a  nonzero  v  G  5R"  such  that  vTd  <  0  for  all 
d  G  D.  B 

It  is  easy  to  see  that  if  Vf(x)  exists  at  x  and  is  nonzero,  then,  by  choosing 
v  =  -V/(x),  there  exists  a  d  G  D,  such  that  V f(x)Td,  <  0.  Thus  at  least  one  d  G  D 
is  a  descent  direction. 

3.2  The  Clarke  Calculus 

3.2.1  Basic  Definitions 

Basic  terminology  for  the  Clarke  calculus  is  now  provided.  Definitions  3. 4-3.8  be¬ 
low  are  classical  definitions,  which  can  be  found  in  [31]  and  [133],  among  others. 
The  remaining  definitions  are  due  to  Clarke  [31],  except  for  Definition  3.12,  which 
Clarke  [31]  attributes  to  Bourbaki  without  a  citation.  Although  Clarke  develops  the 
more  general  case  of  a  Banach  space  (with  possibly  infinite  dimensions),  we  restrict 
ourselves  here  to  5?n,  since  GPS  methods  are  not  designed  for  infinite  dimensions. 
Thus,  all  points  referenced  in  this  chapter  lie  in  5Jn,  unless  otherwise  specified. 

We  should  also  note  that  there  are  different  definitions  for  differentiability  of  a 
function  /  at  a  point  x  G  9?”.  The  one  given  below  in  Definition  3.8  is  referred  to 
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as  Gateaux  differentiability,  as  opposed  to  the  traditional  Frechet  type  (see  [119]). 
However,  they  coincide  if  /  is  Lipschitz  near  x  [31]  (see  Definition  3.5  below),  which 
is  the  only  case  we  are  concerned  with  in  this  work. 

Definition  3.4  Let  V  be  a  subset  of  5Rn.  A  function  /  :  V  — »  5RU{+oo} 
is  said  to  be  lower  semi- continuous  at  x  G  V  if  f(x)  <  liminf  f(y). 

Definition  3.5  A  function  f  :Y  C  — ►  5ft  is  said  to  satisfy  a  Lipschitz 
condition  on  Y  if  there  exists  a  scalar  L  >  0  such  that,  for  all  x,  x'  in  Y, 

\f(x)  -  f(x')\  <  L\\x  -  x'\\.  (3.1) 


The  function  /  is  said  to  be  Lipschitz  near  x  G  Y  if,  for  some  e  >  0,  / 
satisfies  a  Lipschitz  condition  on  B(x,  e),  where  B(x,  e)  is  an  open  ball  of 
radius  e  centered  at  x. 


Definition  3.6  A  function  f  :Y  C  — >  5ft  is  said  to  be  convex  on  Y 

if,  for  all  x,y  €  F  and  A  G  [0, 1], 

f(\x  +  (1  -  A )y)  <  Xf(x)  +  (1  -  A )f(y).  (3.2) 

The  function  /  is  said  to  be  convex  near  x  if  it  is  convex  on  a  ball  of  radius 
e  around  x,  for  some  e  >  0. 


Definition  3.7  Let  /  :  — »•  R.  The  one-sided  directional  derivative  of 

/  at  x  in  the  direction  v  G  is  given  by 


fix ;  v )  :=  lim 

J  v  ’  '  t|0 


f(x  ±  tv)  -  fix) 
t 


(3.3) 


when  the  limit  exists. 
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Definition  3.8  A  function  /  :  *Rn  ->  is  said  to  be  differentiable  at  x 
if  there  exists  a  vector  V/(: r)  G  such  that,  for  every  v  G  5ftn,  f'{x\v) 
exists  and  equals  V f(x)Tv. 

Definition  3.9  (Clarke)  Let  /  :  5?"  — >  5?  be  Lipschitz  near  a  given 
point  x.  The  generalized  directional  derivative  of  /  at  x  in  the  direction 
v  is  given  by 

f°(x-,v)  :=  limsup  — (3.4) 

y-*x,tlO  t 

where  t  is  a  positive  scalar. 


Definition  3.10  (Clarke)  The  generalized  gradient  of  /  :  5?"  — *  3?  at  x 
is  defined  to  be  the  set 

df(x)  :=  {C  €  5R"  :  f°(x\v)  >  vT(  for  all  v  €  3?n}.  (3.5) 


Definition  3.11  (Clarke)  The  function  /  :  5?"  — >  5?  is  said  to  be  regular 
at  x  if,  for  each  v  €  f'(x;v)  exists  and  coincides  with  f°(x;v). 


Definition  3.12  (Bourbaki)  The  function  /  :  5Rn 
strictly  differentiable  at  x  if,  for  all  v  G  5?n, 

Um  Z&OJy)-/M.v/(<)V 

y-*x,t|0  t  W 


5R  is  said  to  be 


(3.6) 


3.2.2  Some  Theoretical  Results  of  Clarke 

At  this  point,  we  provide  some  of  the  theoretical  results  proved  in  [31].  The  following 
results  are  listed  without  proof,  since  their  proofs  are  outside  the  scope  of  this  work. 
We  assume  again  that  all  points  and  sets  described  here  reside  in  3?n,  even  though 
most  of  these  results  are  proved  in  [31]  for  general  Banach  spaces. 
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•  Let  /  be  Lipschitz  near  x.  Then  there  is  a  neighborhood  of  x  in  which  /  is 
differentiable  except  on  a  subset  Qf  of  Lebesgue  measure  0. 

•  Let  /  be  Lipschitz  near  x,  and  let  S  be  any  set  of  Lebesgue  measure  0  in  5Rn. 
Then  df(x )  =  co{lim  Vf{xk)  :  xk  -»  x,  xk  &  S  U  0/},  where  “co”  denotes  the 
convex  hull  of  the  specified  set. 

•  The  generalized  directional  derivative  may  be  obtained  from  the  generalized 
gradient  as  follows:  f°{x\v)  =  max{vT(  :  £  €  df(x)}. 

•  The  generalized  directional  derivative,  as  a  function  of  direction  v  is  finite, 
positively  homogeneous,  and  subadditive.  Also  the  relationship  f°{x\—v)  = 
(—f)°(x\v)  holds. 

•  If  x  is  a  minimizer  of  /,  and  /  is  Lipschitz  near  x,  then  0  G  df(x).  This  is 
a  generalization  of  the  standard  1st  order  necessary  condition  for  continuously 
differentiable  /  that  V f(x)  =  0. 

•  If  /  is  differentiable  at  x ,  then  V/(x)  G  df(x). 

•  When  /  is  continuously  differentiable  at  x,  df(x)  reduces  to  the  singleton 
(V/(a:)},  and  when  /  is  convex  near  x,  it  coincides  with  the  subdifferen¬ 
tial  [73,  103];  i.e.,  df(x)  =  {rj  G  :  f(y)  >  f(x)  +  rjT(y  -  x)  for  all  y  G  3?"}. 

•  If  /  is  strictly  differentiable  at  x,  then  /  is  Lipschitz  near  x  and  differentiable 
at  x  with  df(x)  =  (V/(x)}. 

•  If  /  is  Lipschitz  near  x  and  df(x)  reduces  to  a  singleton  {(},  then  /  is  strictly 
differentiable  at  x  and  V/(x)  =  C- 

•  If  /  is  Lipschitz  near  x  and  is  convex  near  x,  then  /  is  regular  at  x. 


35 


The  final  result  of  this  section  is  stated  formally  and  with  proof  because  of  its 
importance  to  Section  3.3. 


Theorem  3.13  A  function  f  is  Lipschitz  near  x,  differentiable  at  x,  and 
regular  at  x,  if  and  only  if  /  is  strictly  differentiable  at  x. 


Proof.  (Clarke  [31])  Suppose  /  is  Lipschitz  near  x,  differentiable  at  x,  and  regular  at 
x.  Local  Lipschitz  continuity  ensures  that  /°(x;  v)  exists  for  all  v  €  5Rn,  and  regularity 
ensures  that  /'(x;  v )  exists  and  equals  /°(x;  v)  for  all  v  €  3?".  Differentiability  ensures 
that  V/(x)  exists  and  that  V/(x)Tu  =  f'(x;v)  =  f°(x;v )  for  all  v  e  Prom  this 
relationship  and  the  properties  above,  we  have  the  following: 

iimjnf/fo+toWM  =  .HinsupMiMM 

y  t  y—*x ,£J0  ^ 

=  -iimSuPi^tv-tv)-fiy+tv) 

y^x,t[  0  t 

=  -/°(x;  -u)  =  — V/(x)T(— u)  =  V/(x)Tu 

=  iimsup/(v±irIzM, 

y— »a:,tlO  ^ 

Since  the  limit  infimum  and  limit  supremum  coincide,  the  limit  exists  and  equals 
V/(x)ru.  Thus,  by  definition,  /  is  strictly  differentiable  at  x. 

Conversely,  suppose  /  is  strictly  differentiable  at  x.  Applying  Definition  3.12  with 
y  =  x  establishes  the  differentiability  of  /  at  x. 

Now  suppose  /  is  not  Lipschitz  near  x.  Then  there  exist  sequences  {x*,}  and  (x).} 
converging  to  x  such  that  both  lie  in  the  open  ball  of  radius  centered  at  x,  and 
1/(4)  -  f(xk)\  >  k\\x'k  -  xfc||.  Define  tk  >  0  and  vk  €  9?"  such  that  x’k  =  xk  +  tkvk 
with  |M|  =  Since  xk  and  x'k  both  converge  to  x,  we  have  tk  J.  0.  Then,  by 
substitution  and  a  little  algebra,  we  have 


\f(xk+tkvk)  -  f(xk) | 


>  Vk. 


t 


(3.7) 
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Strict  differentiability  of  /  ensures  that,  for  any  e  >  0,  there  exists  an  ne  such  that, 
for  all  k  >ne,  we  have  that 


fjXk  +  hVk )  -  f{Xk ) 
tk 


Vf(x)Tvk 


<  e 


(3.8) 


However,  as  k  gets  arbitrarily  large,  vk  — >  0,  making  it  impossible  for  (3.8)  to  hold 
because  the  term  has  norm  exceeding  Vk,  which  becomes  unbounded. 

Therefore,  /  is  Lipschitz  near  x. 

To  show  that  /  is  regular  at  x,  we  have  by  Definitions  3.8,  3.9,  and  3.12  that  for 
any  v  €  3?", 


f'(x-v)  =  Vf(x)Tv 


lim 

y— >x,t[0 


f(y  +  tv)  -  f(y) 


lim  sup 
y->x,tlO 


f(y  +  tv)  -  f(y) 


Ffav), 


where  the  third  equality  is  due  to  the  fact  the  limit  exists  and  therefore  equals  the 
limit  supremum.  ■ 


3.2.3  Examples 

A  few  examples  are  now  provided  to  illustrate  the  Clarke  calculus  concepts  described 
in  Section  3.2.  The  first  example,  taken  from  Clarke  [31],  is  the  simple  absolute  value 
function,  and  is  included  to  simply  illustrate  the  basic  definitions. 

Example  3.14  Consider  the  function  /  :  3£  — *  5ft  defined  by  f(x)  =  |a;|. 

For  x  >  0,  the  definitions  yield,  f°{x\v)  =  v  and  df{x)  =  {1},  while, 
for  x  <  0,  we  get  f°{x]v)  =  -v  and  df(x)  =  {-1}.  These  results  are 
entirely  consistent  with  the  classical  results  that  f(x)  =  1  for  x  >  0, 
and  f'{x)  —  —1  for  x  <  0.  However,  for  x  —  0,  it  is  clear  that  /  is  not 
differentiable;  hence,  it  is  neither  strictly  nor  continuously  differentiable. 
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A  straightforward  application  of  the  definition  yields  f°(Q,v)  =  |u|,  and 
/  is  regular,  since  /°( 0;  1)  =  1  =  /'( 0;  1)  and  /°( 0;  -1)  =  -1  =  /'( 0;  -1). 

By  definition,  3/(0)  contains  all  C  satisfying  |n|  >  C v  for  all  v\  i.e.,  3/(0)  = 
[-1,1].  Note  that  this  interval  contains  0,  meaning  that  the  first-order 
necessary  condition  (namely,  0  G  3/(0))  is  satisfied  for  0  to  be  a  minimizer 
of  /,  which  indeed  it  is. 

The  next  example,  adapted  from  [6]  and  [31],  shows  a  function  that  is  differentiable 
at  a  given  point,  but  not  strictly  differentiable  there. 


Example  3.15  Consider  the  function  /  :  5R  — >  9?  defined  by  f(x )  = 
x2  [2  +  sin  (^)]  with  /( 0)  =  0.  This  function,  pictured  in  Figure  3.1,  has 
a  global  minimizer  at  x  =  0  and  possesses  infinitely  many  local  optima 
near  0.  One  can  show  that  /  is  differentiable  everywhere,  including  at 
£  =  0;  thus  it  is  also  Lipschitz  near  x  =  0.  However,  the  derivative  is  not 
continuous  at  0,  and  /  is  not  strictly  differentiable  at  0,  since  3/(0)  = 

[ — 7r,  7r] .  Also,  /  is  not  regular  at  0,  since  /°(x;±l)  =  7r  >  0  =  f'(x ;±1), 
the  latter  because  /'( 0)  =  0. 

The  following  example,  taken  from  [6]  and  [73],  shows  a  function  that  is  strictly 
differentiable  at  a  given  point,  but  not  continuously  differentiable  there. 


Example  3.16  Consider  the  convex  function,  /  :  3?  — >  3?,  depicted  in 
Figure  3.2,  defined  by  f(x)  —  f*  <p(u)du,  where 


cp(u) 


u  if  u  <  0 


l  1+7  lfK  +  1  >  \i-  K  e  Z+ 

The  function  /  is  Lipschitz  near  x  =  0.  It  is  shown  in  [73]  that  /  has  kinks 
at  k’  K  ~  1>2,  •  • .,  with  df{\)  —  [^-j-,  £].  Since  0  is  a  limit  point  of  the 
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Figure  3.1  A  Differentiable  Function  that  is  not  Strictly  Differentiable 

sequence  f  is  not  strictly  differentiable  in  any  neighborhood  of  x  =  0. 
The  corollary  of  Proposition  2.2.4  in  [31]  states  that  a  Lipschitz  function 
/  is  continuously  differentiable  in  a  neighborhood  of  x  if  and  only  if  it 
is  strictly  differentiable  there.  Thus,  /  is  not  continuously  differentiable 
near  x.  Furthermore,  df( 0)  reduces  to  the  singleton  {0},  and  the  same 
Proposition  ensures  that  /  is  strictly  differentiable  at  x. 


Hiriart-Urruty  and  Lemarechal  Ex:  Hiriart-Urruty  and  Lemarecha! 


Figure  3.2  A  Strictly,  but  not  Continuously  Differentiable  Function 
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3.3  New  Theoretical  Results 

The  results  presented  in  this  section  are  stated  here  formally  for  the  first  time,  al¬ 
though  some  of  the  ideas  have  been  shown  in  proofs  by  Audet  and  Dennis  [6]  in  the 
context  of  certain  limit  points  of  the  GPS  algorithm.  We  first  define  a  new  concept 
of  D-regularity  and  relate  it  to  some  other  concepts  from  the  Clarke  calculus.  The 
final  theorem  will  be  very  useful  in  later  convergence  proofs. 

Definition  3.17  Let  D  be  a  set  of  directions  in  9?".  A  function  /  is 
said  to  be  D-regular  at  x  G  9?"  if,  for  all  d  e  D,  f'(x;  d)  exists  and 
f'(x;d)  =  f°(x\ d). 

Clearly,  a  function  that  is  regular  at  a  point  is  also  D-regular  there  for  any  D  C  9?n. 
Lemma  3.18  below  provides  additional  conditions  by  which  a  D-regular  function  can 
be  shown  to  be  regular.  This  result  is  then  used  to  relate  D-regularity  to  strict 
differentiability  in  Theorem  3.19. 

Lemma  3.18  Let  /  be  differentiable  at  x.  Then  /  is  regular  at  x  if  and 
only  if  /  is  D-regular  at  x,  where  D  is  a  positive  spanning  set. 

Proof.  Let  /  be  regular  at  x.  Then,  by  definition,  f'(x;v)  exists  for  all  v  G  3Rn,  and 
/  is  D-regular  for  any  spanning  set  D. 

Conversely,  suppose  /  is  differentiable  and  D-regular  at  x,  with  D  a  positive 
spanning  set.  Let  v  G  9in  be  arbitrary.  Then  v  =  |fi  ai^i  for  some  a,  >  0  and 
di  G  D,  i  =  1,2,..., |D|,  and  by  the  assumptions  and  properties  of  the  specified 
derivatives, 

/'(*;«)  < 

\D\  \D\  \D\ 

<  Y,aif°(x\di)  =  =  J2<xiVf(x)Tdl  =  Vf(x)Tv 

i=l  i=  1  z=l 
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Thus,  /  is  regular  at  x.  • 

Theorem  3.19  Let  the  function  /  be  strictly  differentiable  at  x  G  5ftn. 

Then  /  is  Lipschitz  near  x,  differentiable  at  x,  and  D- regular  at  x  for  any 
set  of  directions  D  in  5ftn.  Conversely,  if  /  is  Lipschitz  near  x,  differentiable 
at  x ,  and  D-regular  at  x,  where  D  is  a  positive  spanning  set  for  then 
/  is  strictly  differentiable  at  x. 

Proof.  This  follows  directly  from  Theorem  3.13  and  Lemma  3.18.  ■ 

Example  3.20  shows  that  the  converses  of  Lemma  3.18  and  Theorem  3.19  are  not 
necessarily  true  if  D  is  not  a  positive  spanning  set.  That  is,  it  shows  a  differentiable 
function  that  is  D-regular,  but  not  regular  because  D  is  not  a  positive  spanning  set. 


Example  3.20  Consider  the  function  /  :  T2 

x2  [2  +  sin(f)], 


f(x,y)  =  < 


0, 


>  3f?,  defined  by 

if  x  7^  0, 
if  x  =  0. 


(3.9) 


This  function  has  the  same  properties  as  that  of  Example  3.15,  except 
that  for  d  =  (0,  ±1),  /°( 0;  d)  =  /'( 0;  d)  =  0.  Thus,  as  in  Example  3.15,  / 
is  differentiable  (hence  Lipschitz)  everywhere,  but  not  regular  or  strictly 
differentiable  at  x  =  0.  However,  /  is  D-regular  at  0  for  D  =  {(0,  ±1)}, 
which  does  not  positively  span  $ft”.  ■ 


Theorem  3.21  below  essentially  establishes  first-order  necessary  conditions  for  op¬ 
timality  with  respect  to  the  continuous  variables  in  a  mixed  variable  domain.  The 
assumptions  on  /  given  here  are  slightly  weaker  than  the  strict  differentiability  as¬ 
sumption  used  in  [6]  to  establish  first-order  results  for  GPS  limit  points  -  but  only 
in  the  constrained  case.  In  the  unconstrained  case,  the  assumptions  on  /  given  here 
are  equivalent  to  strict  differentiability  by  Theorem  3.19.  Although  simply  assuming 
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strict  differentiability  would  make  the  theorem  statement  shorter,  the  assumptions 
made  here  more  clearly  convey  the  relationship  between  the  constrained  and  uncon¬ 
strained  cases. 

However,  we  first  introduce  new  notation,  so  that  /'(x;  (d,0))  denotes  the  direc¬ 
tional  derivative  at  x  with  respect  to  the  continuous  variables  in  the  direction  d  E  ift"" 
(i.e.,  while  holding  the  discrete  variables  constant  -  hence  the  0  E  Znd ),  /°(x;  (d,0)) 
denotes  the  Clarke  generalized  directional  derivative  at  x  with  respect  to  the  contin¬ 
uous  variables,  and  dcf(x)  represents  the  generalized  gradient  of  /  at  x  with  respect 
to  the  continuous  variables.  This  convention  is  used  throughout  Chapters  5  and  6  as 
well. 


Theorem  3.21  Let  x  =  (xc,xri)  E  X  C  3K  x  Znd .  Let  the  function  / 
satisfy  the  following  conditions: 

1.  /  is  Lipschitz  near  x  with  respect  to  the  continuous  variables, 

2.  /  is  differentiable  at  x  with  respect  to  the  continuous  variables, 

3.  /  is  D- regular  at  x  with  respect  to  the  continuous  variables  for  a  set 
of  vectors  D  in  5ftn<\ 

If  D  positively  spans  the  tangent  cone  TXc{x),  and  if  /°(x;  (d,0))  >  0  for 
all  d  €  D  (1  TXc(x),  then  Vc/(x)ru  >  0  for  all  v  E  TXc(x).  Thus,  x  is 
a  KKT  point  of  /  with  respect  to  the  continuous  variables.  Moreover,  if 
Xc  =  5RnC  or  if  xc  lies  in  the  interior  of  Xc ,  then  /  is  strictly  differentiable 
at  x  with  respect  to  the  continuous  variables  and  0  =  Vc/(x)  E  dcf(x). 

Proof.  Under  the  hypotheses  given,  let  D  be  a  set  of  vectors  that  positively  spans 
TXc(x),  and  let  v  E  Xc  be  arbitrary.  Then  v  =  [  Qf,d,  for  some  a,  >  0  and  d{  E  D , 
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i  =  1,2, ,  |D|.  Then 

\D\  \D\  \D\ 

Vcf(x)Tv  =  X>VC/(*)T4  =  (di.O))  =  YJ^if°(x;  (di, 0))  >  0, 

i=l  i=l  i=l 

since  all  the  terms  of  the  final  sum  are  nonnegative. 

If  Xc  =  or  if  xc  lies  in  the  interior  of  Xc,  then  Txc(x)  =  5RnC,  and  D  is 
a  positive  spanning  set;  thus,  by  Corollary  3.19,  /  is  strictly  differentiable  at  x. 
Furthermore,  we  have  Xcf(x)Tv  >  0  for  all  v  E  9?"  ,  and  since  this  holds  for  —  v  as 
well,  we  also  have  Vcf(x)Tv  <  0  for  all  v  €  9inC.  Therefore,  0  =  Vc/(a;)  6  dcf(x), 
(the  last  step  because  dcf(x )  always  contains  Vc/(x),  if  it  exists).  ■ 
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Chapter  4 

GPS  for  NLP  Problems 

The  purpose  of  this  chapter  is  to  lay  out  the  details  of  two  GPS  algorithms  for  NLP 
problems.  The  first  section  describes  the  basic  GPS  algorithm  for  NLP  problems 
having  finitely  many  linear  constraints,  while  the  second  details  the  filter  GPS  algo¬ 
rithm  for  NLP  problems  with  general  nonlinear  constraints.  Each  section  contains  a 
description  of  the  algorithm  and  a  summary  of  convergence  results. 

We  should  note  that,  since  this  chapter  is  meant  to  clarify  already  existing  algo¬ 
rithms  and  results,  the  convergence  results  remain  exactly  the  same  as  those  found 
in  [6,  7],  with  the  exception  of  some  revised  terminology  and  notation.  This  is  done 
despite  our  recent  discovery  that  the  key  hypothesis  of  strict  differentiability  can  be 
slightly  weakened  in  a  few  of  the  important  theorems  without  weakening  the  results. 
However,  the  new  hypothesis,  which  is  based  on  that  of  Theorem  3.21,  is  used  ex¬ 
tensively  in  Chapters  5  and  6  for  the  more  general  classes  of  algorithms  ( e.g .,  see 
Theorems  5.16,  5.17,  and  5.20  in  the  next  chapter). 

4.1  GPS  for  Linearly  Constrained  NLP  Problems 

In  this  section,  we  consider  the  NLP  problem, 

min/(x),  (4.1) 

where  /  :  X  ->  3?  U  {oo},  X  =  {x  e  :  £  <  Ax  <  u},  A  e  5Rmxn  is  a  real  matrix, 
(3?  U  {±oo})n,  and  i  <  u. 

A  key  point  is  that  if  an  iterate  falls  outside  of  the  domain  X,  it  is  simply  ignored. 
This  differs  in  construction  from  that  of  [6],  [7],  and  [8],  where  the  algorithm  is  applied 
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to  fx  =  f  +  ipf  rather  than  /,  where  ipf  is  the  indicator  function  for  /;  that  is,  it  is 
zero  for  any  point  in  X  and  oo  outside  of  X.  For  the  remainder  of  this  document,  we 
use  the  alternative  (but  equivalent)  construction  of  ignoring  or  not  considering  points 
outside  of  X. 

We  should  note  also  that  while  we  allow  £  =  u  in  the  formulation,  equality  con¬ 
straints  are  problematic  in  practice  because  points  outside  of  X  are  not  evaluated 
by  the  algorithm,  and  any  roundoff  error  would  eliminate  feasible  points  from  being 
considered.  Thus,  to  use  these  methods  in  practice,  variables  ought  to  be  eliminated 
until  all  the  constraints  can  be  expressed  as  £  <  u. 

The  GPS  algorithm  is  a  direction-based  method,  whose  convergence  theory  is 
based  on  searching  in  directions  that  form  a  positive  spanning  set.  A  finite  set  of 
positive  spanning  directions  D  is  used  to  construct  (theoretically)  a  mesh  on  which 
GPS  iterates  lie.  This  is  based  on  the  positive  basis  theory  of  Davis  [38],  and  is  done 
with  the  idea  that,  if  /  is  sufficiently  smooth  in  a  neighborhood  of  xk  and  V/(x)  ^  0, 
then  at  least  one  direction  d  €  D  must  be  a  descent  direction  (see  Theorem  3.3). 
Thus,  for  some  sufficiently  small  value  of  A*,  >  0,  there  exists  a  point  y  =  xk  +  A kd 
satisfying  f(y )  <  f(x).  Specifically,  at  iteration  k,  the  mesh  Mk  is  defined  as: 

Mk  =  {xk  +  AkDzeX:zeZlf1},  (4.2) 

where  xk  is  the  current  iterate,  Ak  >  0  is  a  parameter  that  controls  the  fineness  of 
the  mesh,  and  Z+  is  the  set  of  nonnegative  integers.  For  convenience,  the  positive 
spanning  set  D  is  represented  here  as  a  real  n  x  |D|  matrix  whose  columns  are  the 
vectors  in  the  set. 

The  positive  spanning  directions  must  also  satisfy  the  mild  restriction, 


D  =  GZ, 


(4.3) 
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where  G  £  5Rnxn  is  a  nonsingular  generating  matrix,  and  Z  £  Znx|D|.  A  common 
choice  for  G  is  the  identity  matrix.  This  construct  is  essential  for  the  convergence 
theory  and  will  be  discussed  in  greater  detail  in  Chapter  5.  An  example  of  directions 
not  constructed  in  this  manner  is  provided  in  [6],  in  which  convergence  cannot  be 
guaranteed.  Coope  and  Price  [35]  use  a  less  restrictive  construct  in  their  grid-based 
methods,  but  they  require  that  the  mesh  size  be  explicitly  forced  to  converge  to  zero 
( i.e .,  it  is  a  hypothesis  in  their  convergence  theory). 

4.1.1  The  Basic  GPS  Algorithm 

The  GPS  algorithm  is  formally  stated  in  Figure  4.1.  Each  iteration  is  characterized 
by  an  optional  global  SEARCH  step  and  a  local  POLL  step.  In  the  SEARCH  step,  the 
objective  function  /  is  evaluated  at  a  finite  number  of  points  lying  on  the  current 
mesh  Mk  in  an  attempt  to  try  to  find  a  new  point  with  a  better  function  value  than 
the  incumbent.  Any  strategy  may  be  used  (including  none),  as  long  as  the  number 
of  mesh  points  it  evaluates  is  finite.  The  user  can  apply  a  favorite  heuristic,  such  as 
those  described  in  Section  2.2,  or  perhaps  optimize  an  inexpensive  surrogate  func¬ 
tion,  as  is  common  in  difficult  engineering  design  problems  with  expensive  function 
evaluations  [5,  19,  20,  21]. 

In  the  POLL  step,  a  positive  spanning  set  Dk  C  D  is  chosen  from  which  to  construct 
the  poll  set.  Again,  we  represent  Dk  also  as  a  matrix  whose  columns  are  the  members 
of  the  set.  It  is  a  function  of  k  and  xk ;  i.e .,  Dk  =  D(k,xk )  C  D.  The  poll  set  Pk  is 
constructed  as  the  neighboring  mesh  points  in  each  of  the  directions  in  Dk\  i.e., 

Pk  —  {%k  +  A kd  £  X  :  d  £  Dk}  (4.4) 

The  function  /  is  evaluated  at  points  in  Pk  until  the  points  have  all  been  evaluated, 
or  until  one  with  a  lower  objective  function  value  is  found. 


46 


The  set  of  trial  points  is  defined  as  Tk  =  Sk  U  Pk)  where  Sk  is  the  finite  set  of 
mesh  points  evaluated  during  the  SEARCH  step.  The  following  definitions  define  the 
two  possible  outcomes  of  the  SEARCH  and  POLL  steps. 

Definition  4.1  If  f(y)  <  f(xk )  for  some  y  E  Tk,  then  y  is  said  to  be  an 
improved  mesh  point 

Definition  4.2  If  f{xk)  <  f(y)  for  all  y  €  Pk,  then  xk  is  said  to  be  a 
mesh  local  optimizer. 

If  either  the  search  or  POLL  step  is  successful  at  finding  an  improved  mesh  point, 
then  it  becomes  the  new  incumbent  xk+1}  and  the  mesh  is  coarsened  according  to  the 
rule, 

Afc+1  =  r<  Afc,  (4.5) 

where  r  >  1  is  rational  and  fixed  over  all  iterations,  and  the  integer  m k  satisfies 
0<m+  <  mmax  for  some  fixed  integer  ramax  >  0.  Coarsening  of  the  mesh  is  designed 
to  allow  the  algorithm  to  possibly  skip  over  certain  local  minima  and  find  a  better  one. 
It  does  not  prevent  convergence  of  the  algorithm,  and  often  makes  it  faster.  Note  that 
only  a  simple  decrease  in  the  objective  function  value  is  needed,  as  is  proved  in  [130] . 

If  the  SEARCH  and  POLL  steps  both  fail  to  find  an  improved  mesh  point,  then  the 
incumbent  is  a  mesh  local  optimizer  and  remains  unchanged  (or,  alternatively,  can 
be  chosen  as  a  point  having  the  same  function  value  as  the  incumbent,  if  one  exists), 
while  the  mesh  is  refined  according  to  the  rule, 

Afc+1  =  Tmfc  Afe,  (4.6) 

where  r  >  1  is  defined  above,  rm*  €  (0, 1),  and  the  integer  mk  satisfies  mmin  <  rnk  < 
—  1  for  some  fixed  integer  mmin. 
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It  follows  that,  for  any  integer  k  >  0,  there  exists  an  integer  rk  such  that 


Afc  =  TrfcA0 


(4.7) 


Generalized  Pattern  Search  Algorithm  -  GPS 

Initialization:  Let  x0  be  such  that  f(x0)  is  finite,  and  let  M0  C  X  be  the  mesh 
defined  by  A0  >  0  and  x0  (see  (4.2)). 

For  k  =  0, 1, 2, ... ,  perform  the  following: 

1.  SEARCH  step:  Employ  some  finite  strategy  seeking  an  improved  mesh  point; 
i.e.,  xk+i  £  Mk  such  that  f(xk+x)  <  f(xk). 

2.  POLL  step:  If  the  SEARCH  step  was  unsuccessful,  evaluate  /  at  points  in  the 
poll  set  Pfc  until  an  improved  mesh  point  xk+l  is  found  (or  until  done). 

3.  Update:  If  SEARCH  or  POLL  finds  an  improved  mesh  point, 

Update  xk+i ,  and  set  Afc+1  >  Afc  according  to  (4.5); 

Otherwise,  set  xk+l  =  xk ,  and  set  A*.+1  <  Ak  according  to  (4.6). 


Figure  4.1  Basic  GPS  Algorithm 

To  handle  bound  and  linear  constraints  and  still  guarantee  the  convergence  results 
of  the  basic  algorithm,  the  only  requirement  is  that  the  directions  in  D  be  sufficiently 
rich  to  ensure  that  polling  directions  can  be  chosen  that  conform  to  the  geometry  of 
the  constraint  boundaries,  and  that  these  directions  be  used  infinitely  many  times. 
Figure  4.2  depicts  a  set  of  conforming  directions  in  3?2,  similar  to  a  drawing  in  [95]. 
Audet  and  Dennis  [6]  abstract  this  notion  of  conformity  in  Definition  4.3,  but  appeal 
to  the  construction  of  Lewis  and  Torczon  [95],  who  actually  provide  an  algorithm 
for  choosing  conforming  directions  using  standard  linear  algebra  tools.  A  slightly 
restructured  but  equivalent  version  of  this  algorithm  is  given  in  Figure  4.3.  Note  that 
this  algorithm  will  not  work  if  any  of  active  constraints  are  degenerate. 


Figure  4.2  Directions  that  Conform  to  the  Geometry  of  X 


Definition  4.3  A  rule  for  selecting  the  positive  spanning  sets  Dk  = 
D(k,xk)  C  D  conforms  to  X  for  some  e  >  0,  if  at  each  iteration  k  and 
for  each  y  in  the  boundary  of  X  for  which  \\y  —  xk\\  <  e,  the  tangent  cone 
Tx(y)  is  generated  by  nonnegative  linear  combinations  of  a  subset  of  the 
columns  of  Dk. 

4.1.2  Summary  of  Convergence  Results 

Convergence  results  for  the  basic  GPS  algorithm  are  presented  in  detail  in  [8].  They 
are  only  summarized  here,  since  the  full  theoretical  results  are  presented  for  more 
general  cases  of  the  algorithm  in  Chapters  5  and  6. 
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Algorithm  for  Computing  Conforming  Directions  Dk 

•  Set  ek  >  e  >  0.  Assume  the  current  iterate  xk  satisfies  £  <  Axk  <  u. 

•  While  ek  >  e,  do  the  following: 

1.  Let  Ie(xk,ek)  =  {i  :  Ax  -  C  <  ek} 

2.  Let  Iu(xk,  ek)  =  {i  :  u  -  Ax  <  efc} 

3.  Set  V  denote  the  matrix  whose  columns  are  formed  by  all  the  mem¬ 
bers  of  the  set  {— a,  :  i  Ie(xkl  e*,)}  U  {a*  :  i  G  Iu(xk,  Cfc)},  where  af 
denotes  the  i-th  row  of  A. 

4.  If  V  does  not  have  full  column  rank,  then  reduce  ek  just  until 
|A(^fc,efc)|  +  \Iu(xk,ek)\  is  decreased,  and  return  to  step  1. 

•  Set  B  =  V(VTV)V~l  and  N  =  I  —  V(VTV)V~lVT . 

•  Set  Dk  =  [TV, -TV, 


Figure  4.3  Algorithm  for  Generating 
Conforming  Directions  (Lewis  and  Torczon) 


To  ensure  convergence,  the  following  mild  assumptions  are  made: 

Al:  All  iterates  {xa,}  produced  by  the  algorithm  lie  in  a  compact  set. 

A2:  The  set  of  directions  D  =  GZ ,  as  defined  in  (4.3),  includes  tangent  cone 

generators  for  every  point  in  X. 

A3:  The  rule  for  selecting  positive  spanning  sets  Dk  conforms  to  X  for  some  e  >  0. 

Assumption  Al  already  is  sufficient  to  guarantee  that  there  are  convergent  sub¬ 
sequences  of  the  iteration  sequence.  However,  this  is,  in  fact,  a  standard  assump¬ 
tion  [6,  7,  8,  32,  35,  47,  94,  95,  130].  A  sufficient  condition  for  this  to  hold  is  that  the 
level  set  L(x0)  =  {x  €  X  :  f(x)  <  /(x0)}  is  compact.  We  can  assume  that  L(x0)  is 
bounded,  but  not  closed,  since  we  allow  /  to  be  discontinuous  and  extended  valued. 
Thus  we  can  assume  that  the  closure  of  L(x0)  is  compact.  We  should  also  note  that 
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most  real  engineering  optimization  problems  have  simple  bounds  on  the  design  vari¬ 
ables,  which  is  enough  to  ensure  that  Assumption  A1  is  satisfied,  since  iterates  lying 
outside  of  X  are  not  evaluated.  In  the  unconstrained  case,  note  that  Assumptions 
A2  and  A3  is  automatically  satisfied  by  any  positive  spanning  set. 

Assumption  A2  is  automatically  satisfied  if  G  =  I  and  the  constraint  matrix  A 
is  rational,  as  is  the  case  in  [95].  Note  that  the  finite  number  of  linear  constraints 
ensures  that  the  set  of  tangent  cone  generators  for  all  points  in  X  is  finite,  which 
ensure  that  the  finiteness  of  D  is  not  violated.  However,  this  is  not  the  case  for 
nonlinear  constraints,  which  is  the  subject  of  Section  4.2. 

If  /  is  lower  semi-continuous  at  any  GPS  limit  point  x,  then  f(x)  <  \imkf(xk), 
with  equality  if  /  is  continuous.  Of  particular  interest  are  limit  points  of  subsequences 
for  which  lim^  A  i:  =  0,  which  we  know  exist  because  of  Xorczon  s  [130]  key  result  that 
lirn  inffc  Afc  =  0.  The  Torczon  result  is  discussed  in  great  detail  in  Chapter  5.  This 
leads  to  the  following  definitions  that  will  be  used  throughout  the  remainder  of  this 
document. 

Definition  4.4  A  subsequence  of  GPS  mesh  local  optimizers  {xk}keK 
(for  some  subset  of  indices  K)  is  said  to  be  a  refining  subsequence  if 
{A k}k&K  converges  to  zero. 

Definition  4.5  Let  x  be  a  limit  point  of  a  refining  subsequence  {xk}keK. 

A  direction  d  £  D  is  said  to  be  a  limit  direction  of  x  if  wk  =  xk  +  A kd  £  X 
and  f(xk)  <  f(wk )  for  infinitely  many  k  £  K. 

Audet  and  Dennis  [8]  prove  the  existence  of  at  least  one  convergent  refining  sub¬ 
sequence.  An  important  point  is  that,  since  a  limit  direction  d  is  one  in  which 
xk  +  A kd  £  X  infinitely  often,  it  must  be  a  feasible  direction  at  the  x,  and  thus 
lies  in  the  tangent  cone  Tx{x). 
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The  key  results  of  Audet  and  Dennis  are  now  given.  The  first  shows  directional 
optimality  conditions  under  the  assumption  of  Lipschitz  continuity,  and  is  obtained 
by  a  very  short  and  elegant  proof  using  Clarke’s  [31]  definition  of  the  generalized  di¬ 
rectional  derivative.  Following  this  result  is  an  example  showing  a  very  simple  convex 
function,  in  which  the  directional  optimality  conditions  hold,  but  no  stronger  result 
can  be  obtained.  The  final  two  results  in  this  section,  which  follow  from  Theorem  3.21, 
show  convergence  to  points  satisfying  first-order  necessary  optimality  conditions  un¬ 
der  the  assumption  of  strict  differentiability. 

Theorem  4.6  Let  £  be  a  limit  of  a  refining  subsequence,  and  let  d  €  D 
be  any  limit  direction  of  x.  Under  Assumptions  A1-A3,  if  /  is  Lipschitz 
near  x,  then  the  generalized  directional  derivative  of  /  at  x  in  the  direction 
d  is  nonnegative,  be.,  /°(x;d )  >  0. 

Example  4.7  Let  /  :  3?2  ->  5ft  be  defined  by  f(x)  =  ||x||oo.  Let  x0  = 

(1, 1),  and  D  =  {±ei,  ±e2},  and  choose  an  empty  SEARCH  step.  For  any 


V 


:  f(x)  =  1} 

i 

x0  =  (1,  1) 

Figure  4.4  An  Illustrative  Example  of  Directional  Optimality 


value  of  A  >  0  and  for  all  d,  €  D,  it  is  not  difficult  to  see  that  x0  +  Ad 
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cannot  yield  an  improved  mesh  point.  (The  directions  ex  and  e2  are  ascent 
directions,  while  the  directions  -ex  and  — e2  are  parallel  to  the  level  sets 
of  /.)  Thus,  the  algorithm  will  stall  at  x  =  (1, 1)  without  moving. 

It  is  easy  to  show  that  /  is  Lipschitz  everywhere,  but  not  differentiable  at 
x  =  (1, 1).  Furthermore,  it  is  clear  that  Theorem  4.6  holds;  i.e.,  f°(x;  d)  > 

0  for  each  d  G  D.  However,  any  direction  that  lies  in  the  interior  of 
the  cone  generated  by  -ex  and  -e2  is  a  descent  direction  (none  of  these 
directions  belong  to  D). 

Note  that  the  specific  choices  of  x0  and  D  force  this  unusual  behavior, 
which  can  be  remedied  by  incorporating  either  a  slight  perturbation  in 
the  starting  point  or  poll  directions,  or  any  reasonable  nonempty  search 
step.  ■ 

Theorem  4.8  Under  Assumptions  A1-A3,  if  /  is  strictly  differentiable 
at  a  limit  point  x  of  a  refining  subsequence,  then  V  f(x)Tw  >  0  for  all 
w  G  Tx(x),  and  -V/(x)  G  Nx(x).  Thus,  x  satisfies  the  KKT  first-order 
necessary  conditions  for  optimality. 

Corollary  4.9  Under  Assumption  Al,  let  X  —  and  x  be  any  limit  of 
a  refining  subsequence.  If  /  is  strictly  differentiable  at  x,  then  V/(x)  =  0. 

4.2  GPS  for  General  Constrained  NLP  Problems 

Nonlinear  constraints  pose  a  problem  for  the  basic  GPS  algorithm  in  that  choosing 
enough  directions  to  conform  to  the  geometry  of  the  constraints  (to  guarantee  con¬ 
vergence  to  a  KKT  point)  would  require  an  infinite  number  of  directions  in  D,  which 
the  convergence  theory  does  not  support.  Thus,  for  NLP  problems  with  nonlinear 
constraints,  a  different  strategy  must  be  employed. 
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To  address  nonlinear  constraints,  Lewis  and  Torczon  [96]  introduce  a  derivative- 
free  augmented  Lagrangian  pattern  search  method,  based  on  the  approach  developed 
by  Conn,  Gould,  and  Toint  [32],  for  solving  NLP  problems  if  the  functions  are  contin¬ 
uously  differentiable.  They  are  able  to  prove  convergence  to  a  KKT  point,  under  the 
assumption  of  continuous  differentiability  of  the  objective  and  constraint  functions  on 
the  domain  of  the  problem.  However,  the  method  requires  estimates  of  Lagrange  mul¬ 
tipliers  and  a  penalty  parameter  that  must  be  updated  at  every  iteration.  Currently, 
no  numerical  performance  results  have  been  published. 

As  an  alternative,  Fletcher  and  Leyffer  [47]  developed  the  filter  algorithm  as  a 
way  to  globalize  SQP  and  SLP  methods  without  the  use  a  merit  or  penalty  function, 
which  would  require  the  user  to  specify  the  relative  weighting  of  optimality  versus 
feasibility.  In  filter  algorithms,  the  goal  is  to  minimize  two  functions,  the  objective  / 
and  a  continuous  aggregate  constraint  violation  function  h  that  satisfies  h(x )  >  0  with 
h(x)  =  0  if  and  only  if  x  is  feasible.  The  function  h  is  often  set  to  h(x)  =  ||C(.x)+||, 
where  ||  •  ||  is  a  vector  norm  and  C(x)+  is  the  vector  of  constraint  violations  at  x\  i.e., 
for  i  =  1, 2, . . . ,  m,  Cf(x)+  =  Ct(x)  if  Q(x)  >  0;  otherwise,  Q(x)+  —  0.  In  particular, 
if  the  squared  2-norm  is  used,  then  h  inherits  whatever  smoothness  properties  C 
possesses  [7]. 

4.2.1  The  Filter  GPS  Algorithm 

The  concept  of  a  filter  is  based  on  a  notion  of  dominance.  The  definition  of  domi¬ 
nance  provided  below,  which  comes  from  the  multi-criteria  optimization  literature,  is 
adapted  from  a  similar  term  in  [47],  so  that  it  is  defined  with  respect  to  the  objective 
function  /  and  constraint  violation  function  h.  A  formal  definition  of  a  filter  follows 
immediately  thereafter. 
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Definition  4.10  A  point  x  G  9ftn  is  said  to  dominate  y  <E  5ft”,  written 
x  -<  y,  if  /(x)  <  /(y)  and  h(x)  <  %)  with  either  /(re)  <  f{y)  or 
h(x)  <  h(y) 

Definition  4.11  A  filter,  denoted  T,  is  a  finite  set  of  points  such  that 
no  pair  of  points  x  and  y  in  the  set  have  the  relation  x  -<  y. 

In  constructing  a  filter  for  GPS,  we  put  two  additional  restrictions  on  T.  First, 
we  set  a  bound  on  aggregate  constraint  violation,  fimax  so  that  each  point  y  G  T 
satisfies  h(y)  <  hmax.  Second,  we  include  only  infeasible  points  in  the  filter  and 
track  feasible  points  separately.  This  is  done  in  order  to  avoid  a  problem  with  what 
Fletcher  and  Leyffer  [47]  refer  to  as  “blocking  entries” ,  in  which  a  feasible  filter  point 
with  lower  function  value  than  a  nearby  local  minimum  prevents  convergence  to  both 
that  minimum  and  a  global  minimum.  Tracking  feasible  points  outside  of  the  filter 
circumvents  this  uncommon  but  plausible  scenario.  With  these  two  modifications, 
the  following  terminology  is  now  provided. 

Definition  4.12  A  point  x  is  said  to  be  filtered  by  a  filter  T  if  any  of 
the  following  properties  hold: 

1.  There  exists  a  point  y  6  T  such  that  y  <  x, 

2.  /l(x)  ^  fimax; 

3.  h(x)  —  0  and  f(x)  >  fF,  where  fF  is  the  objective  function  value  of 
the  best  feasible  point  found  thus  far. 

The  point  x  is  said  to  be  unfiltered  by  T  if  it  is  not  filtered  by  T . 

Thus,  the  set  of  unfiltered  points,  denoted  by  T,  is  given  by 

y  =  U  {y  :  y  y  x}  U  {y  :  h(y)  >  hmax}  U  {y  :  %)  =  0,  f(y)  >  fF}.  (4.8) 

x$:T 
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Observe  that,  with  this  notation,  if  a  new  trial  point  has  the  same  function  values  as 
those  of  any  point  in  the  filter,  then  the  trial  point  is  filtered.  Thus,  only  the  first 
point  with  such  values  is  accepted  into  the  filter. 

In  unconstrained  GPS,  the  POLL  step  is  performed  by  evaluating  the  objective 
function  at  the  mesh  neighbors  of  the  current  iterate  ( i.e .,  the  poll  set)  until  an 
improvement  is  found.  When  the  filter  is  added  for  general  NLP  problems,  the  poll 
set  consists  of  the  mesh  neighbors  of  the  poll  center  pk ,  which  is  chosen  either  as  the 
incumbent  best  feasible  point  pk  or  the  incumbent  least  infeasible  point  p[  (/  and  F 
denote  infeasible  and  feasible,  respectively).  The  poll  set  is  centered  at  pk  therefore 
defined  as 


Pk{Pk)  —  {Pk  +  Afcd  €  X  :  d  G  Dk },  (4-9) 

where  Dk  C  D  is  the  set  of  poll  directions  at  iteration  k.  For  convenience,  we  use 
the  notation,  h[  =  h(p[)  >  0,  f[  =  f(prk),  and  f[  =  f(pk).  The  flexibility  in 
selecting  either  point  as  the  poll  center  does  not  affect  convergence  behavior,  other 
than  converging  to  a  different  limit  point  [7],  The  bi-loss  graph  in  Figure  4.5  depicts  a 
filter  with  its  possible  poll  centers  for  the  next  iteration.  Observe  that  feasible  points 
lie  on  the  vertical  axis  (labelled  /). 

We  should  note  that  polling  around  other  points  in  the  filter  is  certainly  allowed; 
however,  other  points  do  not  possess  the  same  convergence  properties  as  pk  and  p!k 
in  the  limit.  Therefore,  any  such  polling  should  be  regarded  as  part  of  the  SEARCH 
step,  so  that  the  convergence  theory  is  preserved. 

The  next  two  definitions  describe  the  two  possible  outcomes  of  the  SEARCH  and 


poll  steps. 
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Feasible  region: 
Trial  set: 
Filtered  points: 


{xe  :  h(x)=  0} 
Tk 

Tk 


Unfiltered  points  found:  Tk  \  Tk  7^  0 

Mesh  isolated  filter  points:  Tk  C  Tk 


Figure  4.5  Graphical  Depiction  of  a  Filter  and  Types  of  Iterations. 

Definition  4.13  Let  Tk  denote  the  set  of  trial  points  to  be  evaluated  at 
iteration  k,  and  let  Tk  denote  the  set  of  filtered  points  described  by  (4.8). 

A  point  y  E  Tk  is  said  to  be  an  unfiltered  point  if  y  $  Tk- 

Definition  4.14  Let  Pk(Pk)  denote  the  poll  set  centered  at  the  point  pk, 
and  let  Tk  denote  the  set  of  filtered  points  described  by  (4.8).  The  point 
pk  is  said  to  be  a  mesh  isolated  filter  point  if  Pk{pk)  C  Tk-  If  Pk  £  {pk  ,p(}, 
then  pk  is  said  to  be  a  mesh  isolated  poll  center. 

At  iteration  k,  if  an  unfiltered  point  is  found  in  either  the  SEARCH  or  POLL  step,  then 
the  mesh  is  coarsened  according  to  the  rule  in  (4.5),  and  the  poll  center  is  updated, 
if  appropriate.  If  an  unfiltered  point  is  not  found,  then  the  poll  center  pk  is  a  mesh 
isolated  filter  point,  and  the  mesh  is  refined  according  to  the  rule  in  (4.6). 

The  Filter  GPS  algorithm  for  NLPs  is  presented  in  Figure  4.6.  The  algorithm 
requires  an  initial  population  of  the  filter,  which  can  be  done  in  any  manner  stated 
in  [7].  Note  that,  if  there  are  no  constraints,  then  h  =  0,  and  an  unfiltered  point  only 
applies  to  the  single  function  value  of  /.  Thus,  in  this  case,  the  Filter  GPS  algorithm 
reduces  to  the  basic  version  shown  in  Figure  4.1. 
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Generalized  Filter  Pattern  Search  Algorithm  -  FGPS 

Initialization:  Let  x0  be  an  undominated  point  of  a  set  of  initial  solutions.  Include 
all  these  points  in  the  filter  together  with  hmax  >  h(x0).  Fix  A0  >  0. 

For  k  =  0, 1, 2, ... ,  perform  the  following: 

1.  Update  poll  center  pk  €  {pk,pk}- 

2.  Compute  incumbent  values  fk  =  f{pk),  K  =  MpD>  fl  =  HpD- 

3.  SEARCH_step:  Employ  some  finite  strategy  seeking  an  unfiltered  mesh  point 
•Tfc+l  $  -F k- 

4.  POLL  step:  If  the  SEARCH  step  did  not  find  an  unfiltered  point,  evaluate  / 
and  h  at  points  in  the  poll  set  Pk(pk )  until  an  unfiltered  mesh  point  xk+i  g  Tk 
is  found  (or  until  done). 

5.  Update:  If  SEARCH  or  POLL  found  an  unfiltered  mesh  point, 

Update  filter  Pk+1  with  xk+u  and  set  Afc+i  >  Ak  according  to  (4.5); 
Otherwise,  set  Tk+l  =  Tk,  and  set  Afc+1  <  Afc  according  to  (4.6). 


Figure  4.6  Filter  GPS  Algorithm 


4.2.2  Summary  of  Convergence  Results 

Convergence  results  for  the  filter  GPS  algorithm  are  presented  in  detail  in  [7].  They 
are  only  summarized  here,  since  the  full  theoretical  results  are  presented  for  the  more 
general  case  of  mixed  variable  problems  in  Chapter  6.  The  same  assumptions  as  those 
listed  in  Section  4.1.2  apply. 

Mesh  Size  Behavior 

The  behavior  of  Ak  is  almost  identical  to  that  of  Ak  for  the  basic  GPS  algorithm. 
Because  the  same  assumptions  hold  (in  particular,  all  iterates  lie  in  a  compact  set), 
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the  results  of  Torczon  [130]  apply;  namely,  that  Afc  is  bounded  above  by  a  positive 
constant  independent  of  k,  and  liminf  Ak  =  0. 

k — »+oc 

The  addition  of  the  filter  makes  it  necessary  to  redefine  the  ideas  of  a  refining 
subsequence  and  a  limit  direction;  however,  the  definitions  now  provided  generalize 
the  previous  ones,  in  that  they  are  equivalent  to  the  those  of  the  basic  GPS  method 
if  there  are  no  nonlinear  constraints. 

Definition  4.15  A  subsequence  of  mesh  isolated  poll  centers  {pk}keK 
(for  some  subset  of  indices  K )  is  said  to  be  a  refining  subsequence  if 
{Ak}keK  converges  to  zero. 

Definition  4.16  Let  p  is  a  limit  point  of  a  refining  subsequence  {pkjkeK- 
A  direction  d  e  D  is  said  to  be  a  limit  direction  of  p  if  Pk  +  A kd  belongs 
to  X  and  is  filtered  for  infinitely  many  k  €  K. 

In  [7]  Audet  and  Dennis  prove  the  existence  of  convergent  refining  subsequences  and 
show  that  any  such  subsequence  has  at  least  one  limit  point  and  one  positive  spanning 
set  of  limit  directions. 

Results  for  the  Constraint  Violation  Function 

For  the  constraint  violation  function  h,  the  following  two  properties  hold  without 
assuming  any  constraint  qualification. 

Theorem  4.17  Let  p  be  a  limit  point  of  a  refining  subsequence.  Under 
Assumptions  A1-A3,  if  h  is  Lipschitz  continuous  near  p ,  then  h°(p]  d)  >  0 
for  all  limit  directions  d  G  D  of  p. 

Theorem  4.18  Under  Assumption  Al,  let  p  be  a  limit  of  a  refining 
subsequence.  If  h  is  strictly  differentiable  at  p,  then  Xh(fi)  =  0. 
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Although  we  would  like  to  converge  to  a  point  p  that  is  feasible  (i.e.,  h(p)  =  0), 
Theorem  4.18  only  guarantees  convergence  to  a  first-order  stationary  point  of  h. 
Still,  any  feasible  point  is  also  a  stationary  point  -  in  fact,  a  global  minimizer  of  h\ 
therefore,  the  algorithm  can  (and  often  will)  converge  to  such  a  feasible  point.  If  p 
is  not  feasible,  then  p  still  satisfies  first-order  conditions  for  being  a  local  minimizer 
of  h.  This  result  may  seem  less  advantageous,  but  it  is  no  different  than  for  many 
gradient-based  algorithms,  such  as  sequential  quadratic  programming  [114]. 

Results  for  the  Objective  Function 

The  following  results  apply  to  the  objective  function  /.  Note  that,  in  general,  con¬ 
vergence  to  a  KKT  point  is  not  guaranteed,  in  that  the  negative  gradient  does  not 
necessarily  lie  inside  the  normal  cone  at  the  limit  point.  However,  Theorem  4.20 
specifies  a  cone  contained  in  the  tangent  cone  at  the  limit  point,  whose  polar  contains 
the  negative  gradient. 

Theorem  4.19  Let  p  be  a  limit  point  of  a  refining  subsequence  {pk}keKt 
and  let  d  €  D  be  a  limit  direction  of  p.  Under  Assumptions  A1-A3,  if  /  is 
Lipschitz  continuous  near  p  and  f(pk )  <  f(pk  +  A kd)  for  infinitely  many 
k  in  K,  then  f°(p;d)  >  0. 

Theorem  4.20  Let  p  be  a  limit  point  of  a  refining  subsequence  {pk}k^K , 
and  let  be  the  cone  generated  by  all  limit  directions  d  €  D  of  p,  for 
which  f(pk)  <  f(pk  +  A kd)  infinitely  often.  Under  Assumptions  A1-A3, 
if  /  is  strictly  differentiable  at  p ,  then  —  V/(p)  belongs  to  the  polar  V^°  of 

VA. 
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Other  corollaries  associated  with  the  Filter  GPS  algorithm  assume  that  C{x )  <  0 
holds,  in  which  case,  the  results  are  identical  to  those  of  Section  4.1.  Thus,  this 
algorithm  is  indeed  a  generalization  of  the  basic  GPS  algorithm. 

One  other  key  result  must  be  mentioned.  The  following  theorem  shows  that  if 
/  is  sufficiently  smooth,  the  algorithm  will  not  stall  at  any  non-stationary  point, 
regardless  of  the  positive  spanning  set  chosen. 

Theorem  4.21  If  h  and  /  are  strictly  differentiable  at  poll  center  pk,  and 
if  y/(pfc)  7^  0,  then  there  cannot  be  infinitely  many  consecutive  iterations 
where  pk  is  a  mesh  isolated  filter  point. 
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Chapter  5 

GPS  for  Linearly  Constrained  MVP  Problems 

In  this  chapter,  the  class  of  Mixed  Variable  Generalized  Pattern  Search  (MGPS)  al¬ 
gorithms  is  described.  The  algorithm,  which  is  developed  and  presented  first,  is  the 
work  of  Audet  and  Dennis  [6] .  Some  notation  is  changed  slightly,  and  some  additional 
definitions  are  provided  for  the  sake  of  clarity.  However,  the  convergence  results  that 
follow  are  new  and  apply  to  a  broader  class  of  problems,  as  the  smoothness  assump¬ 
tions  of  Audet  and  Dennis  have  been  relaxed.  This  algorithm  is  applied  successfully 
in  [89]  to  the  design  of  a  thermal  insulation  system,  an  expanded  version  of  which  is 
the  subject  of  Chapter  7. 

We  remind  the  reader  that,  as  discussed  in  Chapter  1,  categorical  variables  are 
those  that  must  take  on  values  from  a  pre-defined  list  (e.g.,  color,  material  type). 
They  are  often  assigned  numeric  values,  but  such  values  may  be  meaningless.  The 
key  point  is  that  these  variables  cannot  be  relaxed  so  as  to  take  on  other  values,  as 
would  be  done  in  most  algorithms  for  solving  mixed  integer  problems.  The  addition 
of  categorical  variables  can  also  create  other  challenges  not  common  in  most  opti¬ 
mization  problems.  In  particular,  a  change  in  discrete  variable  values  can  generate  a 
change  in  the  constraints,  and  even  a  change  in  the  dimension  of  the  problem. 

For  mixed  variable  problems,  we  partition  each  point  x  into  its  continuous  and 
discrete  parts  xc  and  xd ,  respectively;  i.e.,x  =  ( xc ,  xd),  where  xd  G  Znd  and  xc  €  3£nC, 
and  where  nc  and  nd  denote  the  maximum  dimensions  of  the  continuous  and  discrete 
variables,  respectively.  (By  convention,  unused  variables  are  ignored.) 
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The  problem  under  consideration  in  this  chapter  is  then  expressed  as  follows: 

min  f(x)  (5-1) 

xex 

where  /  :  X  C  5ft”c  x  Znd  — *•  9?  u  {oo},  where  X  is  partitioned  into  respective 
continuous  and  discrete  variable  spaces,  Xc  C  5ft"c  and  Xd  C  Zn\  with  respective 
dimensions  nc  and  nd.  Furthermore,  the  continuous  variable  space  is  defined  by  a 
finite  set  of  linear  constraints,  dependent  on  the  values  of  xd\  i.e ., 

Xc(xd )  =  {xc  e  5ft”c  :  £{xd )  <  A(xd)xc  <  u(xd)},  (5.2) 

where  A(xd)  G  5 lmCxnC  is  a  real  matrix,  £(xd),u(xd)  G  (5ft  U  {±oo})"c,  and  £(xd)  < 
u(xd )  for  all  values  of  xd. 

As  noted  in  the  previous  chapter,  equality  constraints  are  problematic  in  practice 
because  points  outside  of  X  are  not  evaluated  by  the  algorithm,  and  any  roundoff  error 
would  eliminate  feasible  points  from  being  considered.  Thus,  to  use  these  methods  in 
practice,  variables  ought  to  be  eliminated  until  all  the  constraints  can  be  expressed 
as  £{xd)  <  u{xd)  for  all  values  of  xd. 

If  nd  =  0,  then  the  problem  reduces  to  a  standard  linearly  constrained  NLP 
problem,  in  which  £,  A,  and  u  do  not  change.  For  convenience,  we  now  omit  the 
explicit  dependence  on  xd  in  this  chapter,  even  though  it  is  understood  for  MVP 
problems. 

5.1  Mesh  Construction  and  Local  Optimality 

The  MGPS  algorithm  is  an  extension  of  the  basic  GPS  algorithm  in  which  discrete  or 
categorical  variables  are  permitted.  To  accommodate  this  change,  the  mesh  must  be 
defined  differently,  but  in  a  way  that  reduces  to  that  of  the  basic  GPS  mesh  structure 
if  there  are  no  discrete  variables.  The  construction  here  is  slightly  more  general  here 
than  in  [8],  in  that  no  single  generating  matrix  is  required. 
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For  each  combination  i  =  1,2,...  ,imax,  of  values  that  the  discrete  variables  may 
take,  a  set  of  positive  spanning  directions  Di  is  constructed  by  forming  the  product 

Di  =  GiZit  (5.3) 

where  Gi  G  SRnCxnC  is  a  nonsingular  generating  matrix,  and  Zx  G  ZnCy^D^.  We  will 
sometimes  use  D(x )  in  place  D1  to  indicate  that  the  set  of  directions  is  associated  with 
the  discrete  variable  values  of  rr  G  X.  The  set  D  is  then  defined  by  D  =  D\ 
The  mesh  is  formed  as  the  direct  product  of  Xd  with  the  union  of  a  finite  number 
of  lattices  in  Xc;  i.e., 

*max 

Mk  =  x  U  {xck  +  A kDlz  exc:ze  Zlf1}.  (5.4) 

i=l 

We  should  note  that  the  mesh  is  purely  conceptual  and  is  never  explicitly  created. 
Instead,  directions  are  only  generated  when  necessary  in  the  algorithm. 

In  order  to  solve  problems  with  categorical  variables,  a  notion  of  local  optimality  is 
needed.  For  continuous  variables,  this  is  well-defined  in  terms  of  local  neighborhoods. 
However,  for  categorical  variables,  a  local  neighborhood  must  be  defined  by  the  user, 
and  there  may  be  no  obvious  choice  for  doing  so;  special  knowledge  of  the  underlying 
engineering  process  or  physical  problem  may  be  the  only  guide. 

To  make  the  definition  as  general  as  possible,  we  define  local  neighborhoods  in 
terms  of  a  set- valued  function  Af  :  X  — >  2X ,  where  2X  denotes  the  power  set  (or  set 
of  all  possible  subsets  of  X).  By  convention,  we  assume  that  for  all  x  G  X,  the  set 
Af{x)  is  finite,  and  x  G  A f(x). 

For  our  analysis,  we  also  require  a  definition  of  a  limit  in  the  mixed  variable 
domain,  along  with  a  notion  of  continuity  for  Af,  given  by  the  following  definitions: 

Definition  5.1  Let  X  C  (IR"0  x  Zn‘l)  be  a  mixed  variable  domain.  A 
sequence  {xj  C  X  is  said  to  converge  to  x  G  X  if,  for  every  e  >  0,  there 


exists  a  positive  integer  N  such  that  xf  =  xd  and  \\xct  -  xc||  <  e  for  all 
i  >  N.  The  point  x  is  said  to  be  the  limit  point  of  the  sequence  {£;}. 
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Definition  5.2  The  set-valued  function  M  :  X  C  (9?nC  x  Znd)  — >  2X  is 
said  to  be  continuous  at  x  G  X  if,  for  every  e  >  0,  there  exists  5  >  0  such 
that,  whenever  u  G  X  satisfies  ud  =  xd  and  ||uc  -  xc\\  <  5 ,  the  following 
two  conditions  hold: 

1.  If  y  G  J\f{x),  then  there  exists  v  G  Af(u)  satisfying  vd  =  yd  and 
||uc  -  yc ||  <  e, 

2.  If  v  e  Af(u),  then  there  exists  y  G  J\f{x)  satisfying  yd  =  vd  and 

II yc  -  ^cll  < e- 

Definition  5.2  will  ensure  in  Theorem  5.12  that,  given  a  certain  convergent  sub¬ 
sequence  of  iterates  and  a  corresponding  convergent  subsequence  of  points  that  are 
discrete  neighbors  of  these  iterates,  the  limit  point  of  the  discrete  neighbor  points  is 
itself  a  discrete  neighbor  of  the  limit  point  of  the  iterates. 

As  an  example,  one  common  choice  of  neighborhood  function  for  integer  variables 
is  the  one  defined  by  M{x)  =  {y  G  Xd  :  ||?/-z||i  <  1}-  However,  categorical  variables 
may  have  no  inherent  ordering,  which  would  make  this  choice  inapplicable. 

With  a  well-defined  neighborhood  function,  we  can  now  extend  the  classical  defini¬ 
tion  of  local  optimality  to  mixed  variable  domains,  by  the  following  slight  modification 
of  a  similar  definition  by  Audet  and  Dennis  [8]. 

Definition  5.3  A  point  x  =  (xc,  xd)  G  X  is  said  to  be  a  local  minimizer 
of  f  with  respect  to  the  set  of  neighbors  Af(x)  C  X  if  there  exists  an  e  >  0 
such  that  f(x)  <  f(v)  for  all  v  in  the  set 

Xn  U  (B(yc,e)xyd). 

yeN(x) 


(5.5) 


65 


The  function  J\f  also  must  be  constructed  so  that  every  discrete  neighbor  of  the 
current  iterate  lie  on  the  mesh;  i.e.,  M  C  Mk  for  all  k  =  0,1,....  This  will  be 
explicitly  stated  as  an  assumption  in  Section  5.3.  Also  observe  the  each  lattice  in 
(5.4)  is  expressed  as  a  translation  from  xk ,  as  opposed  to  yk,  for  some  yk  6  N(xk). 
This  is  necessary  to  ensure  convergence  of  the  algorithm. 

Note  that  this  construction  does  not  require  that  a  point  and  its  discrete  neighbors 
have  the  same  continuous  variable  values;  rather,  it  requires  that  they  both  lie  on  the 
same  mesh.  In  fact,  Kokkolaras  et  al.  [89]  construct  their  neighbor  sets  in  a  way  that 
neighbors  often  do  not  have  the  same  continuous  variable  values. 

5.2  The  MGPS  Algorithm 

Just  like  the  basic  GPS  algorithm,  each  iteration  of  the  MGPS  algorithm  consists  of 
a  SEARCH  and  possibly  a  poll  step.  The  search  step  is  simply  a  global  search  of  a 
finite  number  of  trial  points  on  the  mesh.  It  is  almost  identical  to  that  of  the  basic 
GPS  algorithm,  and  is  discussed  in  detail  in  Chapters  2  and  4. 

Polling  in  the  MGPS  algorithm  is  performed  with  respect  to  the  continuous  vari¬ 
ables,  the  discrete  neighbor  points,  and  the  set  of  points  generated  by  an  EXTENDED 
POLL  step. 

At  iteration  k,  let  Dlk  C  Dl  denote  the  set  of  poll  directions  corresponding  to  the 
i-th  set  of  discrete  variable  values,  and  we  define  Dk  —  D\.  Again,  we  will  often 

write  D\  as  Dk(xk)  to  indicate  that  the  polling  directions  are  associated  with  the 
discrete  variable  values  of  xk.  The  poll  set  centered  at  xk  is  defined  as 


PkM  =  {Xk  +  Afc(d,0)  e  X  :  d  €  Dk(xk)}. 


(5.6) 
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We  remind  the  reader  that  the  notation  (d,  0)  is  consistent  with  the  partitioning  into 
continuous  and  discrete  variables,  respectively,  where  0  means  that  discrete  variables 
do  not  change  value.  Thus,  xk  +  A k(d,  0)  =  (xk  +  Akd,  xk). 

In  some  cases  where  the  poll  set  and  set  of  discrete  neighbors  fail  to  produce  a 
lower  objective  function  value,  MGPS  performs  an  EXTENDED  POLL  step,  in  which 
additional  polling  is  performed  around  any  points  in  the  set  of  discrete  neighbors 
whose  objective  function  value  is  sufficiently  close  to  the  incumbent  value.  That  is, 
if  y  e  AZ'(xfc)  satisfies  f(xk )  <  f(y)  <  f(xk )  +  &  for  some  user-specified  tolerance 
value  >  £  (called  the  extended  poll  trigger ),  where  £  is  a  fixed  positive  scalar, 
then  we  begin  a  polling  sequence  {y{}^i  with  respect  to  the  continuous  neighbors 
of  yk  beginning  with  yk  =  yk  and  ending  when  either  f(ykk  +  Ak(d,  0))  <  f(xk ) 
for  some  d  E  Dk(ykk),  or  when  f(xk )  <  f{ykK  +  Afc(d,  0))  for  all  d  E  Dk(ykk).  For 
this  discussion,  we  let  zk  =  ykk ,  the  last  iterate  (or  endpoint )  of  the  EXTENDED 
POLL  step.  We  should  note  that  in  practice,  the  parameter  is  typically  set  as 
a  percentage  of  the  objective  function  value  (but  bounded  away  from  zero),  such 
as,  say,  =  max{^,0.05|/(.xfc)|}.  A  relatively  high  choice  of  &  will  generate  more 
extended  POLL  steps,  thus  making  a  solution  more  global,  but  it  will  require  more 
function  evaluations.  On  the  other  hand,  a  lower  value  of  £k  will  cost  fewer  function 
evaluations,  but  it  will  tend  to  make  the  solution  more  local. 

The  set  of  extended  poll  points  for  a  discrete  neighbor  y  6  A f(xk),  denoted  £{y), 
contains  a  subset  of  the  points  {Pk(yi)}jti-  At  iteration  k,  the  set  of  points  evaluated 
in  the  extended  poll  step  (or  extended  poll  set)  is  given  by 

mo  =  U  (5-7> 

ye 


where  A fk  =  {y  E  A f{xk)  :  f(xk)  <  f(y)  <  f(xk)  +  ^fc}- 
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The  set  of  trial  points  is  defined  as  Tk  =  Sk  U  Pk{xk)  U  N(xk)  U  Xk{£k),  where 
Sk  is  the  finite  set  of  mesh  points  evaluated  during  the  SEARCH  step.  Similar  to  the 
basic  GPS  algorithm  described  in  Section  4.1,  the  following  definitions  define  the  two 
outcomes  of  the  SEARCH,  POLL,  and  EXTENDED  POLL  steps. 

Definition  5.4  If  f(y)  <  f(xk )  for  some  y  G  Tk,  then  y  is  said  to  be  an 
improved  mesh  point. 

Definition  5.5  If  f(xk )  <  f(y)  for  all  y  G  Pk(xk)Uj\f(xk)UXk(£k),  then 
xk  is  said  to  be  a  mesh  local  optimizer. 

These  trial  points  are  illustrated  in  Figure  5.1,  where  they  are  indexed  by  iteration 
number  k.  In  the  lower  plane,  the  sequence  of  MGPS  iterates  is  given  by  the  points 
xk  with  limit  point  x.  (We  assume  that  k  is  sufficiently  large,  so  that  xdk  -  xdk+l  = 

. . .  =  xd .)  In  the  upper  plane,  the  sequence  of  discrete  neighbor  points  represented  by 
yk  G  Af(xk)  converges  to  y  G  N(x).  For  each  iteration  k ,  the  EXTENDED  POLL  step 
is  represented  by  the  sequence  yk, . . . ,  ?/£, . . . ,  zk,  where  the  sequence  of  extended  poll 
endpoints  represented  by  zk  converges  to  limit  point  2,  and  intermediate  extended 
poll  centers  ifk  converge  to  limit  point(s)  y.  We  should  note  that  existence  of  the  limit 
points  described  here  has  not  yet  been  established,  but  will  be  proven  in  Section  5.3, 
and  in  particular,  Theorem  5.12. 

Figure  5.2  shows  an  example  with  one  discrete  and  two  continuous  variables, 
in  which  the  set  of  neighbors  of  the  current  iterate  xk  is  assumed  to  be  Af(xk)  = 
{xk,y\,y'i}-  (The  subscripts  1  and  2  are  added  to  distinguish  the  points  in  the  set 
of  neighbors  A f(xk).  Note  that  the  points  in  ftf(xk)  do  not  have  the  same  continuous 
variable  values).  The  iterate  xk  is  a  local  minimizer  of  /  on  the  set  Pk{xk)  UAf(xk)  (J 
**(&)  if,  for  some  e  >  0,  f(xk)  <  f(v)  for  all  v  G  B(xk,e)  U  B(y0lte)  U  The 
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Figure  5.1  Limit  Points  of  Iterates  and  Extended  Poll  Centers. 

letters  a  to  e,  i  to  l,  and  u  to  w  in  the  figure  represent  poll  points  around  the  different 
poll  centers,  xk,yi,y\,  and  y%. 

The  MGPS  algorithm  is  presented  formally  in  Figure  5.3.  Note  that  updating  of 
the  current  iterate  and  mesh  size  parameter  are  done  in  exactly  the  same  way  as  in 
the  basic  GPS  algorithm  presented  in  Figure  4.1. 

5.3  New  Convergence  Results 

A  key  assumption  in  the  original  Audet  and  Dennis  GPS  paper  [8]  for  mixed  vari¬ 
able  problems,  consistent  with  the  assumptions  of  Torczon  [130]  and  Lewis  and 
Torczon  [94,  95],  is  that  the  objective  function  is  continuously  differentiable  with 
respect  to  the  continuous  variables  on  a  neighborhood  of  the  level  set  Lx{x o)  =  {x  € 
X  :  f(x)  <  f(x o)},  where  Xo  €  X  is  the  initial  iterate.  This  section  is  devoted 
to  new  convergence  results  for  the  bound  and  linearly  constrained  cases,  where  as¬ 
sumptions  on  the  objective  function  are  relaxed,  so  that  the  theory  is  consistent  with 
[6]  and  [7],  as  described  in  Chapter  4.  Specifically,  we  use  the  same  tools  from  the 
non-smooth  calculus  of  Clarke  [31]  to  provide  a  more  complete  characterization  of  the 
limit  points.  This  approach  is  continued  in  Chapter  6  so  that  the  FMGPS  algorithm 
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Pk{xk)  =  {u,v,w} 

M{xk)  =  {x  k,y\,yVs 

f(xh)  <  f(y i)  <  f(x k)  +  tk  <  f{y 2) 

*fc(&)  =  {t/i}u{a,6,c} 


Figure  5.2  Construction  of  the  Poll  and  Extended  Poll  Sets 

presented  there  will  unify  all  of  the  GPS  theory  to  date  into  one  framework  for  an 
extremely  broad  class  of  optimization  problems. 

The  following  assumptions  are  made,  the  first  three  of  which  are  identical  to  those 
of  Section  4.1.2: 

Al:  All  iterates  {x*,}  produced  by  the  MGPS  algorithm  lie  in  a  compact  set. 

A2:  For  each  fixed  set  of  discrete  variable  values  xd,  the  corresponding  set  of  di¬ 
rections  D1  =  GiZi ,  as  defined  in  (5.3),  includes  tangent  cone  generators  for  every 
point  in  Xc(xd). 

A3:  The  rule  for  selecting  directions  Dk  conforms  to  Xc  for  some  c  >  0  (see 
Definition  4.3). 

A4:  The  discrete  neighbors  always  lie  on  the  mesh;  i.e. ,  U{xk)  C  Mk  for  all  k 

The  first  result,  due  to  Audet  and  Dennis  [6],  describes  the  behavior  of  iteration 
sequences  under  the  mildest  assumptions. 
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General  Mixed  Variable  Pattern  Search  Algorithm  -  MGPS 
Initialization:  Let  x0  <E  X  satisfy  f(x0 )  <  oo.  Set  A0  >  0  and  £  >  0. 

For  k  =  0, 1, 2, ,  perform  the  following: 

1.  Set  extended  poll  trigger  &  >  £. 

2.  Search  step:  Employ  some  finite  strategy  seeking  an  improved  mesh  point; 
i.e.,  xk+x  €  Mk  such  that  f(xk+i)  <  f(xk). 

3.  Poll  step:  If  the  SEARCH  step  does  not  find  an  improved  mesh  point,  evaluate 
/  at  points  in  Pk(xk )  Uj\f(xk)  until  an  improved  mesh  point  xk+i  is  found  (or 
until  done). 

4.  Extended  Poll  step:  If  the  search  and  poll  steps  does  not  find  an 
improved  mesh  point,  evaluate  /  at  points  in  Xk  (£/,.)  until  an  improved  mesh 
point  xk+\  is  found  (or  until  done). 

5.  Update:  If  SEARCH,  POLL,  or  EXTENDED  POLL  finds  an  improved  mesh  point, 
Update  xk+i,  and  set  Afc+i  >  Afc  according  to  (4.5); 

Otherwise,  set  xk+i  =  xk,  and  set  Afc+1  <  Ak  according  to  (4.6). 


Figure  5.3  MGPS  Algorithm 


Theorem  5.6  Under  Assumption  Al,  there  exists  at  least  one  limit 
point  of  the  iteration  sequence  {xk}-  If  /  is  lower  semicontinuous  at  such 
a  limit,  point  x  with  respect  to  the  continuous  variables,  then  limkf(xk) 
exists  and  is  greater  than  or  equal  to  f(x),  which  is  finite.  If  /  is  con¬ 
tinuous  at  every  limit  point  of  {x/J,  then  every  limit  point  has  the  same 
function  value. 

Proof.  Since  /  is  lower  semicontinuous  at  x,  we  know  that  for  any  subsequence 
{xkjkeK  of  the  iteration  sequence  that  converges  to  x,  liminffc6*r  f(xk)  >  f(x),  which 
is  finite  because  f(xk)  is  finite  for  each  k  e  K.  But  the  subsequence  of  function 
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values  is  a  subsequence  of  a  nonincreasing  sequence.  Thus,  the  entire  sequence  is  also 
bounded  below  by  f(x ),  and  so  it  converges.  ■ 

5.3.1  Mesh  Size  Behavior 

The  behavior  of  the  mesh  size  was  originally  characterized  for  NLP  Problems  by 
Torczon  [130],  independent  of  the  smoothness  of  the  objective  function.  It  was  ex¬ 
tended  to  MVP  problems  by  Audet  and  Dennis  [8],  who  later  adapted  a  Torczon 
lemma  to  provide  a  lower  bound  on  the  distance  between  mesh  points  at  each  itera¬ 
tion  [6].  The  proofs  here  are  straightforward  extensions  of  the  latter  work  to  MVP 
problems.  The  first  lemma  provides  the  lower  bound  on  the  distance  between  any 
two  mesh  points  whose  continuous  variable  values  do  not  coincide,  while  the  second 
lemma  shows  that  the  mesh  size  parameter  is  bounded  above.  The  theorem  that 
follows  shows  the  key  Torczon  [130]  result  that  liminffc_+00  Afc  =  0. 

Lemma  5.7  For  any  integer  k  >  0,  let  u  and  v  be  any  distinct  points  in 
the  mesh  Mk  such  that  ud  =  vd.  Then  for  any  norm  for  which  all  nonzero 
integer  vectors  have  norm  at  least  1, 

(58) 

where  the  index  i  corresponds  to  the  discrete  variable  values  of  u  and  v. 

Proof.  Using  (5.4),  we  let  uc  =  xck  +  Afc Dizu  and  vc  =  xck  +  A kDlzv  define  the 
continuous  part  of  two  distinct  points  on  Mk  with  both  zUlzv  €  Furthermore, 

since  we  assume  that  u  and  v  are  distinct  with  ud  =  vd,  we  must  have  uc  ±  vc,  and 
thus  zu  ^  zv.  Then 

IK  -  »1l  =  At|| D\zu  -  z„)||  =  A k\\G,Z,(zu  -  2„)||  >  >  -A*  , 

iigi  ii  im  ii 
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the  last  inequality  because  Zi(zu  -  zv)  is  a  nonzero  integer  vector  with  norm  greater 
than  or  equal  to  one.  ■ 

Lemma  5.8  There  exists  a  positive  integer  ru  such  that  Ak  <  A0rr“ 
for  any  integer  k  >  0. 

Proof.  Under  Assumption  Al,  the  discrete  variables  can  only  take  on  a  finite  number 

of  values  in  Lx(x  0).  Let  imax  denote  this  number,  and  let  I  =  {1,2,.. .  ,imsuc}.  Also 

under  Assumption  Al,  for  each  i  e  I,  let  Y,  be  a  compact  set  in  containing  all  GPS 

iterates  whose  discrete  variable  values  correspond  to  i  G  I.  Let  7  =  maxdiam(Fj)  and 

iei 

(3  =  min  ||G._1||,  where  diam  indicates  the  maximum  distance  between  any  two  points. 
If  Afc  >  7/?,  then  Lemma  5.7  (with  v  =  xk)  ensures  that  any  trial  point  u  G  Mk  either 
satisfies  uc  =  xk  or  would  have  lied  outside  of  (JieJ  Y{.  Then  if  Afc  >  7^,  no  more  than 
imax  successful  iterations  will  occur  before  Afc  falls  below  7/?.  Thus,  Ak  is  bounded 
above  by  7/3(rm,,iax)'imax,  and  the  result  follows  by  setting  ru  large  enough  so  that 
A0rrU  >  7/?(rmmax)imax.  ■ 

Theorem  5.9  The  mesh  size  parameters  satisfy  liminf  A*,  =  0. 

k—>+ OO 

Proof.  (Torczon  [130])  Suppose  by  way  of  contradiction  that  there  exists  a  negative 
integer  re  such  that  0  <  A 0rr<  <  Afc  for  all  integer  k  >  0.  Combining  (4.7)  with 
Lemma  5.8  implies  that  for  any  integer  k  >  0,  rk  takes  its  value  from  among  the 
integers  of  the  finite  set  {re,re  +  1, . . .  ,r“}.  Therefore,  rk  and  Ak  can  only  take  a 
finite  number  of  values  for  all  k  >  0. 

\£)i\ 

Since  xk+i  €  Mk,  (5.4)  ensures  that  XU 1  =  xl  +  AkDlzk  for  some  zk  €  Z\_  and 
1  <  i  <  imax.  By  repeatedly  applying  this  equation  and  substituting  Ak  =  A0rrfc,  it 
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follows  that,  for  any  integer  N  >  1, 


N- 1 


2ft 


A:=l 


N-l 


<  +  A „DlY,rr-zk  =  xl  +  ^A  oD'Y,P'--r'lT'-T^i, 

9  fc=l 


TV  —  1 


ft=l 


where  p  and  q  are  relatively  prime  integers  satisfying  r  =  Since  pTk~T>qr'x~Tk zk  is 
an  integer  for  any  k ,  it  follows  that  the  continuous  part  of  all  iterates  having  the  same 
discrete  variable  values  lies  on  the  translated  integer  lattice  generated  by  xcQ  and  the 

r  t 

columns  of  ^rA 0D\  Moreover,  the  discrete  part  of  all  iterates  also  lies  on  the  integer 
lattice  Xd  C  Znd . 


Therefore,  since  all  iterates  belong  to  a  compact  set,  there  must  be  only  a  finite 
number  of  different  iterates,  and  thus  one  of  them  must  be  visited  infinitely  many 
times.  Therefore,  the  mesh  coarsening  rule  in  (4.5)  is  only  applied  finitely  many  times, 
and  the  mesh  refining  rule  in  (4.6)  is  applied  infinitely  many  times.  This  contradicts 
the  hypothesis  that  A0 rr‘  is  a  lower  bound  for  the  mesh  size  parameter.  ■ 

These  results  show  the  necessity  of  forcing  the  set  of  directions  to  satisfy  D{  = 
GiZi.  Under  Assumption  Al,  this  ensures  that  the  mesh  has  only  a  finite  number 
of  points  in  X ,  which  means  that  there  can  only  be  a  finite  number  of  consecutive 
improved  mesh  points.  Assumption  A2  is  included  to  simply  ensure  that  this  con¬ 
struction  is  maintained  in  the  presence  of  linear  constraints.  Audet  and  Dennis  [6] 
provide  an  example  in  which  a  different  construction  yields  a  mesh  that  is  dense  in 
X.  In  this  case,  Lemma  5.7  cannot  be  satisfied,  and  convergence  of  Ah  to  zero  is  not 
guaranteed.  A  sufficient  condition  for  Assumption  A2  to  hold  is  that  =  I  for  each 
i  =  1,2,...,  *max  and  that  the  coefficient  matrix  A  is  rational  [95]. 
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We  should  note  also  that  the  rationality  of  r  is  essential  for  convergence.  In 
a  revised  version  of  [4],  Audet  gives  an  example  in  which  an  irrational  value  for  r 
generates  a  sequence  satisfying  liminf  Afc  >  0. 

k—>+oo 

5.3.2  Refining  Subsequences 

Since  Xk  shrinks  only  at  iterations  in  which  no  improved  mesh  point  is  found, 
Theorem  5.9  guarantees  that  the  MGPS  algorithm  has  infinitely  many  such  itera¬ 
tions.  Recall  from  Definition  4.4  that  subsequences  of  the  corresponding  iterates  are 
called  refining  subsequences.  Since  the  concept  of  a  limit  direction  must  be  redefined 
for  mixed  variable  spaces,  both  definitions  are  now  provided. 

Definition  5.10  A  subsequence  of  GPS  mesh  local  optimizers  {xk}keK 
(for  some  subset  of  indices  K )  is  said  to  be  a  refining  subsequence  if 
{Afc}fc6/c  converges  to  zero. 

Definition  5.11  Let  {vk}k(£K  be  either  a  refining  subsequence  or  a  cor¬ 
responding  subsequence  of  extended  poll  endpoints,  and  let  v  be  a  limit 
point  of  the  subsequence.  A  direction  d  G  D  is  said  to  be  a  limit  direction 
of  v  if  wk  =  vk  +  Afc(d,0)  €  X  and  f{vk)  <  f(wk )  for  infinitely  many 
k  E  K. 

The  next  theorem,  due  to  Audet  and  Dennis  [8]  (but  slightly  reworded),  identifies 
and  proves  existence  of  limits  of  refining  subsequences,  their  discrete  neighbors,  and  of 
corresponding  extended  poll  endpoints.  The  result  is  independent  of  any  assumptions 
on  /. 

Theorem  5.12  Let  Lx{x o)  =  {x  e  X  :  f(x)  <  f(x0)}.  There  exists 
a  point  x  E  Lx{x 0)  and  a  refining  subsequence  {xk}k&K  (with  associated 
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index  set  K )  such  that  lim  xk  =  x.  Moreover,  if  Af  is  continuous  at  x, 
then  there  exists  y  £  Af(x)  and  z  =  (zc,  yd)  £  X  such  that 

lim  yk  =  y  and  lim  zk  =  z, 

keK  lc€K 

where  each  zk  £  X  is  the  endpoint  of  the  EXTENDED  POLL  step  initiated 
at  yk  £  Af{xk). 

Proof.  (Audet  and  Dennis  [8]).  Theorem  5.9  guarantees  that  lim  inffc_+0O  Afc  =  0; 
thus,  there  is  an  infinite  subset  of  indices  of  mesh  local  optimizers  K'  C  {k  :  Ak+l  < 
Afc},  such  that  the  subsequence  {A/t}*.^/  converges  to  zero.  Since  all  iterates  xk  lie 
in  the  compact  set  Lx(x0),  there  exists  an  infinite  subset  of  indices  K"  C  K'  such 
that  the  subsequence  {xk}keX»  converges.  Let  x  £  Lx(x0)  be  the  limit  point  of  such 
a  subsequence. 

The  continuity  of  J\f  at  x  guarantees  that  y  6  Ar (x)  c  X  is  a  limit  point  of 
a  subsequence  yk  £  A f{xk).  Let  z  £  X  be  a  limit  point  of  the  sequence  zk  £  X 
of  endpoints  of  the  EXTENDED  poll  step  initiated  at  yk.  (By  definition,  zk  =  yk 
whenever  the  EXTENDED  poll  step  is  not  required.)  Choose  K  C  K"  to  be  such 
that  both  {yk}keK  converges  to  y  and  {zk}k^K  is  convergent,  letting  z  denote  the 
limit  point.  a 

5.3.3  Results  for  the  Objective  Function 

Theorem  5.12  defines  and  establishes  existence  of  the  limit  points  x,  y,  and  £.  We 
now  establish  results  for  the  objective  function  at  these  limit  points.  The  first  theo¬ 
rem,  which  Audet  and  Dennis  originally  proved  by  contradiction  [8],  establishes  local 
optimality  conditions  of  x  with  respect  to  its  discrete  neighbors.  The  direct  proof 
presented  here  is  shorter  and  relaxes  assumptions  on  /  as  much  as  possible. 
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Theorem  5.13  Let  x  and  y  €  J\f{x)  be  defined  as  in  the  statement 
of  Theorem  5.12,  such  that  J\f  is  continuous  at  x.  If  /  is  lower  semi- 
continuous  at  x  with  respect  to  the  continuous  variables  and  continuous 
at  y  with  respect  to  the  continuous  variables,  then  f(x)  <  f(y). 

Proof.  From  Theorem  5.12,  we  know  that  {xk}k<=K  converges  to  x  and  {yk}kei< 
converges  to  y.  Since  k:  E  K  ensures  that  {xk}keK  are  mesh  local  optimizers,  we 
have  f(xk)  <  f{yk)  for  all  k  G  AT,  and  by  the  assumptions  of  continuity  and  lower 
semi-continuity,  we  have  f(x)  <  lim keK  f(xk)  <  limfcej^  f(yk)  =  f(y)-  ■ 

The  remaining  results  in  this  section  show  convergence  to  points  satisfying  ap¬ 
propriate  optimality  conditions  with  respect  to  the  continuous  domain  under  weaker 
assumptions  than  those  found  in  [8] .  The  next  two  theorems  establish  a  directional 
optimality  condition  if  the  objective  function  is  Lipschitz  near  the  limit  point  of  a 
refining  subsequence  or  certain  corresponding  extended  poll  endpoints.  Recall  that 
D  is  defined  as  the  finite  union  of  mesh  directions  D\i  —  1,2,...,  imax- 

Theorem  5.14  Let  x  be  a  limit  point  of  a  refining  subsequence.  Under 
Assumptions  A1-A3,  if  /  is  Lipschitz  near  x  with  respect  to  the  continuous 
variables,  then  f°(x ;  (d,0))  >  0  for  all  limit  directions  d  €  D  of  x. 


Proof.  From  Definition  3.9,  we  have 

fQ/,  .  ,  ,.  f(y  +  t(d,0))- f(y) 

f  (x\  (d,  0))  =  lim  sup -  — 

y— >x,  tlO 


>  lim  sup 

keK 


f{xk  +  Ak(d,  0))  -  fjxk) 


t  keK  *  Ak 

The  function  /  is  Lipschitz,  hence  finite,  near  x.  Since  d  is  a  limit  direction  of 
x ,  xk  +  Afc(d,0)  €  X  infinitely  often  in  AT,  and  infinitely  many  of  the  right-hand 
quotients  are  defined.  All  of  these  quotients  must  be  nonnegative,  since,  for  each 
k  e  K,  xk  is  a  mesh  local  optimizer.  ■ 
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Theorem  5.15  Let  x,  y  E  J\f(x),  and  z  be  defined  as  in  the  statement 
of  Theorem  5.12,  with  M  continuous  at  x,  and  let  £  >  0  denote  a  lower 
bound  on  the  extended  poll  triggers  for  all  k.  If  f(y)  <  f(x)  +  £ 
and  /  is  Lipschitz  near  z  with  respect  to  the  continuous  variables,  then 
/°(z;  ( d ,  0))  >  0  for  all  limit  directions  d  G  D  of  z. 


Proof.  Prom  Definition  3.9,  we  have 

ni-M, o)>  =  itosup >  lim5up 

y-z,  UO  t  keK  Ak 

The  function  /  is  Lipschitz,  hence  finite,  near  z.  Since  f(y)  <  f(x)  +  f  ensures  that 

extended  polling  was  triggered  around  yk  G  M{xk)  for  all  k  G  K,  and  since  d  is  a 

limit  direction  of  z,  it  follows  that  yJkk  +  Ak(d,  0)  G  X  infinitely  often  in  K,  and 

infinitely  many  of  the  right-hand  quotients  are  defined.  All  of  these  quotients  must 

be  nonnegative,  since,  for  each  k  G  K,  yJkk  is  an  extended  poll  endpoint.  ■ 


If  there  are  no  constraints  on  the  continuous  variables,  then  there  will  always  be 
a  positive  spanning  set  D  of  directions  that  satisfy  the  hypothesis  of  Theorem  5.14. 
However,  for  bound  or  linear  constraints,  in  order  to  guarantee  the  existence  of  any 
directions  that  satisfy  the  hypothesis,  D  must  contain  a  sufficiently  rich  set  of  direc¬ 
tions  so  that  poll  sets  can  be  constructed  with  directions  Dk  C  D  that  conform  to 
the  geometry  of  the  constraints  (see  Definition  4.3). 

In  the  following  two  theorems,  we  show  that  x  and  certain  z  satisfy  first-order 
necessary  conditions  for  optimality  with  respect  to  the  continuous  variables,  if  / 
satisfies  three  conditions  there  (see  Theorem  3.21),  and  if  the  rule  for  selecting  poll 
directions  conforms  to  Xc  (Assumption  A3).  Recall  that  Definition  3.17  defines  the 
term,  D-regular. 


Theorem  5.16  Let  x  be  a  limit  point  of  a  refining  subsequence  with 
limit  directions  D(x),  and  let  /  satisfies  the  following  assumptions: 
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1.  /  is  Lipschitz  near  x  with  respect  to  the  continuous  variables, 

2.  /  is  differentiable  at  x  with  respect  to  the  continuous  variables, 

3.  /  is  D(x)-regular  at  x  with  respect  to  the  continuous  variables. 

Then  under  Assumptions  A1-A3,  Vcf(x)Tw  >  0  for  all  w  G  TXc(x)  and 
— Vc/(x)  G  NXc(x).  Thus,  x  is  a  KKT  point  with  respect  to  the  continu¬ 
ous  variables.  Furthermore,  if  Xc  =  or  if  xc  lies  in  the  interior  of  Xc, 
then  0  =  Vc/(x)  G  dcf(x). 

Proof.  Since  Assumption  A3  ensures  that  the  rule  for  selecting  Dk  conforms  to  Xc 
for  some  e  >  0,  and  since  there  are  finitely  many  linear  constraints,  D(xk)  — ►  D(x), 
and  D(x)  positively  spans  TXc{x).  By  Theorem  5.14,  we  have  f°(x]  (d,  0))  >  0  for  all 
d  G  D{x ),  and  the  result  follows  directly  from  Theorem  3.21.  ■ 

Theorem  5.17  Let  xjG  M{x),  and  z  be  defined  as  in  the  statement 
of  Theorem  5.12,  with  J\f  continuous  at  x,  and  let  £  >  0  denote  a  lower 
bound  on  the  extended  poll  triggers  £k  for  all  k.  Let  D(z)  be  the  limit 
directions  of  z,  and  suppose  /  satisfies  the  following  assumptions: 

1.  /  is  Lipschitz  near  z  with  respect  to  the  continuous  variables, 

2.  /  is  differentiable  at  z  with  respect  to  the  continuous  variables, 

3.  /  is  D(£)-regular  at  z  with  respect  to  the  continuous  variables. 

If  f{y )  <  /(£)  +£,  then  under  Assumptions  A1-A4,  Xcf(z)Tw  >  0  for 
all  w  G  TXc(z).  Furthermore,  if  Xc  =  or  zc  lies  in  the  interior  of  Xc, 
then  0  =  Vc/(£)  G  dcf(z). 

Proof.  Since  Assumption  A3  ensures  that  the  rule  for  selecting  Dk  conforms  to  Xc 
for  some  e  >  0,  and  since  there  are  finitely  many  linear  constraints,  D(zk)  —>  D(z), 
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and  D(z)  positively  spans  TXc(z).  By  Theorem  5.15,  we  have  /°(z;  (d,  0))  >  0  for  all 
d  e  D(z ),  and  the  result  follows  directly  from  Theorem  3.21.  ■ 

One  special  case  must  still  be  addressed;  namely,  the  case  where  /(f )  =  f(y)  = 
/(z),  but  y  ^  z.  The  following  theorem,  due  to  Audet  and  Dennis  [8]  (except  for  the 
very  last  assertion  on  the  length  of  the  EXTENDED  POLL  step,  which  is  new)  addresses 
this  case.  It  shows  that  there  are  an  infinite  number  of  limit  points  of  extended  poll 
points  with  the  same  value  as  /(f). 

Theorem  5.18  Let  x,  y  £  A /(f),  and  z  be  defined  as  in  the  state¬ 
ment  of  Theorem  5.12,  with  A /  continuous  at  f.  If  f(y)  =  /(f)  and  / 
is  continuous  at  f  and  y  with  respect  to  the  continuous  variables,  then 
under  Assumptions  A1-A4,  f(y)  =  /(f)  for  any  limit  point  y  =  ( yc,yd )  of 
the  sequence  of  extended  poll  points  {y{k} k€K.  Moreover,  if  y  ±  z,  then 
there  are  infinitely  many  of  these  limit  points  y,  and  the  index  Jk  of  the 
extended  poll  endpoint  yJkk  at  iteration  k  satisfies  Jk  — >  +oo  (in  K ). 

Proof.  Let  y  in  M{x)  be  such  that  f(y)  =  f(x).  Let  y  be  a  limit  point  distinct  from 
■y  and  z  of  the  sequence  of  extended  poll  points  {yik}keK. 

Since  /(f)  <  f{y{k+])  <  f{yjk)  for  j  =  0, 1, . . . ,  Jk,  and  since  continuity  of  /  at  f 
and  y  ensures  that  {f{yk)}keK  converges  to  /(f),  it  follows  that  /(f)  =  f(y). 

To  show  the  second  part  of  the  result,  we  first  let  s  =  \\y  -  z\\  be  the  nonzero 
distance  between  y  and  z  (which  is  well-defined,  since  both  points  share  the  same 
discrete  components).  Second,  for  any  scalar  p  in  the  open  interval  (0,  s),  we  define 
the  set 

Y„  =  {rf:  k€K,  je{0,l,...,Jk},  K‘-i||<P,  llsi*+'-SII>Pll}- 

Since  yj!  =  y>  ij  and  =  z*  —>  z,  it  follows  that  the  set  Yv  contains  infinitely 
many  points  for  any  p  in  (0,  s).  Any  limit  point  yv  of  Yp  satisfies  \\yp  -  y||  =  p,  since 
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Afc  converges  to  0  (in  K )  and  y{k+1  =  y£  +  Akd{k  for  some  vector  df  G  Dk{y{k). 
Therefore,  if  p  ^  q,  then  yp  7^  yq  and  the  result  follows. 

Finally,  since  A*,  — >  0  (in  K )  and 


0  < 


lim  || yJkk  -  2/° 1 1  =  lim 
keK  k  keK 


E(vl+,-yl) 


2=0 


<  Jim  H^+1  “  Vk\\  =  Jim  11^  +  Afc(ffc  _  Vi *11  =  k^K  Afc  5Z 

6  i=0  i=0  *=0 

it  follows  that  the  sum  is  unbounded;  thus,  Jk  -+  +00  (in  K).  ■ 

The  necessary  optimality  conditions  shown  in  Theorems  5.16  and  5.17  for  x  and 
z,  respectively,  do  not  necessarily  hold  for  y.  In  fact,  an  interesting  counterexample  is 
found  in  [8].  However,  this  result  can  be  obtained  for  y  by  modifying  the  algorithm. 


5.3.4  Stronger  Results 

In  [130],  Torczon  obtains  stronger  convergence  results  for  NLP  problems  if  the  POLL 
step  is  exhaustive;  i.e.,  if  every  feasible  point  in  the  poll  set  is  evaluated,  even  if 
an  improved  mesh  point  has  already  been  found.  If  found,  the  best  improved  mesh 
point  becomes  the  next  iterate.  In  a  similar  manner,  Audet  and  Dennis  [8]  show  that 
Theorem  5.17  can  be  strengthened  if  the  algorithm  is  modified  to  include  a  stronger, 
but  more  expensive  EXTENDED  POLL  step: 

STRONG  EXTENDED  POLL  step  (at  iteration  k): 

•  Given  y®,  set  y^+1  G  arg  min  f(y),  j  =  0, 1, ... ,  Jk, 

y&Pk{y{) 

•  The  same  positive  basis  Dk(y{)  C  D(y{)  must  be  used  throughout  the  step. 

That  is,  instead  of  performing  the  EXTENDED  POLL  step  only  until  an  improvement 
is  made,  all  the  points  in  the  extended  poll  set  are  evaluated,  and  the  one  with  lowest 
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objective  function  value  is  chosen  (ties  are  broken  arbitrarily)  as  the  poll  center  for 
the  next  EXTENDED  POLL  step  (provided  it  is  an  improvement). 

The  final  two  theorems  show  that,  if  the  STRONG  extended  poll  step  is  used, 
then  the  same  results  from  Theorems  5.16  and  5.17  that  apply  to  x  and  z ,  respectively, 
apply  also  to  y.  The  first  result  assumes  only  Lipschitz  continuity  near  y.  while  the 
second  assumes  strict  differentiability  at  y. 

Theorem  5.19  Let  x  and  y  be  defined  as  in  the  statement  of  Theorem 
5.12,  with  A f  continuous  at  x.  Suppose  that  f(y)  =  /(x),  and  let  y  be  a 
limit  point  of  strong  extended  poll  points  {y{k}keK,  where  jk  is  arbitrary. 
Under  Assumptions  A1-A3,  if  /  is  Lipschitz  near  y  with  respect  to  the 
continuous  variables,  then  f°(y\  ( d ,  0))  >  0  for  all  d  G  D  for  which  /  at  a 
STRONG  EXTENDED  POLL  step  was  evaluated  infinitely  often  (in  K). 

Proof.  Suppose  by  way  of  contradiction  that  there  exists  a  d  E  D  such  that 
f°(y;  (d,  0))  <  0.  Then  from  Definition  3.9,  we  have 

limsup  y  [f(y  +  t(d,  0))  —  f(y)]  <  0. 

By  local  Lipschitz  continuity,  there  exists  an  e  >  0  such  that  f{y  +  t(d,  0))  <  f(y)  for 
all  t  e  (0,e).  Furthermore,  without  loss  of  generality,  for  each  t  e  (0,  e),  there  exists 
an  c£  >  0  such  that  f{y  +  t(d,  0)  +  s(b,  0))  <  f(y)  for  all  s  G  (0,  et)  and  for  all  b  G  SRn 
with  ||6||  =  1. 

Choose  k,  G  K  sufficiently  large  such  that  \\y{k  —  y||  <  e(  and  A*..  <  e.  Now 
observe  that  r/f  =  y  +  s(b,  0)  for  some  s  G  3?  and  b  G  5R"  with  ||6||  =  1.  Then 
s  =  s||(6, 0)||  =  \\y[k  -  y\\  <  et.  Use  of  the  strong  extended  poll  step  (with 
3k  <  Jk)  guarantees  that  /(?yf+1)  <  f(y{k  +  Afc(d,0))  =  f(y  +  5(6,0)  +  Afc(d,0))  = 
f{V  +  Afc(d,0)  +  s(b,  0))  <  f(y),  since  Afc  is  a  specific  case  of  t  G  (0,  e)  and  s  G  (0,  et). 
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Thus,  f{y3kk+1)  <  f{y)  =  /(£)  <  f(xk),  which  contradicts  the  assumption  that  xk  is 
a  mesh  local  optimizer.  ■ 

Theorem  5.20  Let  the  assumptions  of  Theorem  5.19  hold.  Under 
Assumptions  A1-A3,  if  /  satisfies  the  assumptions  of  Theorem  5.16  for 
y,  then  Vcf(y)Tw  >  0  for  all  w  £  TXc{y).  If  Xc  =  9ftnC  or  if  y  lies  in  the 
interior  of  Ac,  then  0  =  Vcf(y)  £  dcf(y). 

Proof.  Since  Theorem  5.19  applies  to  a  set  of  directions  that  positively  span  TXc{y ), 
the  result  follows  directly  from  it  and  Theorem  3.21.  ■ 

5.4  Summary 

The  new  convergence  results  presented  here  show  a  further  characterization  of  MGPS 
limit  points,  consistent  with  earlier  such  analyses  of  the  GPS  [6]  and  filter  GPS  [7] 
algorithms.  In  the  next  chapter,  the  FMGPS  algorithm  for  general  nonlinearly  con¬ 
strained  MVP  problems  is  introduced,  along  with  convergence  results  developed  in 
a  similar  manner.  However,  before  doing  so,  it  is  helpful  to  summarize  what  has 
been  proved  and  its  significance  -  specifically,  in  the  case  where  first-order  necessary 
optimality  conditions  are  proved. 

First,  recall  that  in  Definition  5.3,  we  provided  a  definition  of  local  optimality 
with  respect  to  a  mixed  variable  domain.  This  definition  is  stronger  than  simply 
requiring  optimality  with  respect  to  its  continuous  variables  and  also  with  respect 
to  its  discrete  neighbors.  It  also  requires  that  the  optimal  point  have  a  lower  (or  at 
least  as  low)  objective  function  value  than  any  point  in  a  neighborhood  of  any  of  its 
discrete  neighbors. 
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Given  the  continuity  of  the  set- valued  function  Af  at  x  and  the  three  assumptions 
on  /  listed  in  the  statements  of  Theorems  5.16,  5.17,  and  5.20,  we  have  proved  the 
following  toward  achieving  the  goal  of  local  optimality: 

•  The  limit  point  x  achieves  a  weaker  measure  of  first-order  optimality,  in  that  x 
satisfies  first-order  KKT  necessary  conditions  for  optimality  with  respect  to  its 
continuous  variables,  and  it  has  a  function  value  at  least  as  low  as  its  discrete 
neighbors. 

•  For  any  y  e  A f{x)  that  corresponds  to  a  subsequence  that  did  not  trigger  ex¬ 
tended  polling  infinitely  often,  it  is  trivial  to  show  that  f(x )  <  /(y),  in  which 
case,  continuity  of  the  objective  means  that  f(x)  <  f(v)  for  all  v  in  a  neighbor¬ 
hood  of  y.  This  is  clearly  a  stronger  condition  for  optimality. 

•  For  any  y  £  Af(x)  that  corresponds  to  a  subsequence  that  triggered  extended 
polling  infinitely  often,  the  limit  point  £  satisfies  f(x )  <  f(z )  <  /(y),  and  £  sat¬ 
isfies  the  KKT  necessary  conditions  for  optimality  with  respect  to  its  continuous 
variables.  Thus,  /  has  no  feasible  descent  directions  at  £.  This  is  about  as  close 
as  we  can  come  to  establishing  and  satisfying  a  characterization  of  first-order 
optimality  conditions  for  mixed  variable  problems. 

•  The  STRONG  EXTENDED  POLL  STEP  takes  care  of  the  special  case  where  f(y)  = 
f  (x),  but  y  £.  In  this  case,  under  the  normal  algorithm,  it  is  possible  to  have  a 
limit  point  y  of  intermediate  extended  poll  points  which  has  descent  directions 
not  detected  by  the  algorithm.  Although  this  scenario  is  highly  unlikely  in 
practice,  a  simple  example  in  which  this  occurs  can  be  found  in  [8].  The  stronger 
version  of  the  algorithm  alleviates  this  problem. 
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Chapter  6 


GPS  for  General  Constrained  MVP  Problems 


In  this  chapter,  we  introduce  a  new  class  of  filter  GPS  algorithms  for  mixed  vari¬ 
able  optimization  problems  with  general  nonlinear  constraints.  It  is  constructed  by 
merging  the  ideas  of  the  previous  two  chapters  in  a  way  that  encompasses  the  GPS  al¬ 
gorithms  described  there,  so  that  the  convergence  results  of  Chapters  4  and  5  become 
corollaries  of  the  results  presented  in  this  chapter. 

As  discussed  in  Chapters  1  and  5,  the  MVP  problem  formulation  is  complicated  by 
the  fact  that  changes  in  the  discrete  variables  can  mean  a  change  in  the  constraints, 
and  even  a  change  in  problem  dimension.  Thus,  we  denote  nc  and  n  as  the  mcixiTnuTn 
dimensions  of  the  continuous  and  discrete  variables,  respectively,  and  we  partition 
each  point  x  =  (xc,xd)  into  continuous  variables  xc  G  and  discrete  variables 
xd  G  Znd .  Again,  we  adopt  the  convention  of  ignoring  unused  variables. 

The  problem  under  consideration,  which  is  restated  from  (1.1),  can  be  expressed 
as  follows: 


min  f(x) 

x€X 

s.  t.  C(x )  <  0, 


(6.1) 


where  /  :  X  — ■»  9?  U  {oo},  and  C  :  X  — >  (3?  U  {oo})p  with  C  —  {C\i  C2,  •  •  • ,  Cp)  . 
The  domain  X  =  Xc  x  Xd  is  partitioned  into  continuous  and  discrete  variable  spaces 
Xc  C  TTl'  and  Xd  C  Zn\  respectively,  where  Xc  is  defined  by  a  finite  set  of  bound 
and  linear  constraints,  dependent  on  the  values  of  xd.  That  is, 


Xc(xd)  =  {xc  G  3TC  :  i{xd )  <  A(xd)xc  <  u(xd)}, 


(6.2) 
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where  A(xd)  £  5RmC*nC  is  a  real  matrix,  l(xd) ,  u(xd)  £  (3?  U  {±oo})nC,  and  t(xd)  < 
u(xd )  for  all  values  of  xd. 

Once  again,  although  we  allow  equality  constraints  in  the  theory,  they  cause  sig¬ 
nificant  numerical  difficulties.  Thus,  in  practice,  variables  ought  to  be  eliminated 
until  all  the  linear  constraints  can  be  expressed  in  a  way  that  satisfies  i{xd)  <  u(xd) 
for  all  values  of  xd. 

Note  that  this  formulation  is  indeed  a  generalization  of  the  standard  NLP  problem, 
in  that,  if  nd  =  0,  then  the  problem  reduces  to  a  standard  NLP  problem,  in  which  t, 
A ,  u,  and  C  do  not  change.  For  convenience,  we  now  omit  the  explicit  dependence 
on  xd  in  this  chapter,  even  though  it  is  understood  for  MVP  problems. 

6.1  The  FMGPS  Algorithm 

For  the  new  class  of  algorithms,  since  we  are  adding  the  filter  construct  to  the  mixed 
variable  algorithm,  we  must  redefine  the  mesh  and  poll  set  in  terms  of  a  poll  center,  as 
opposed  to  a  current  iterate  xk  (see  Chapter  5).  That  is,  at  each  iteration  k,  the  poll 
center  pk  £  {pk,pk}  is  chosen  as  either  the  incumbent  best  feasible  point  pk  or  the 
incumbent  least  infeasible  point  pfk.  The  mesh  is  constructed  as  the  direct  product  of 
Xd  with  the  union  of  a  finite  number  of  lattices,  but  translated  from  the  poll  center; 
i.e., 

*max 

Mk  =  Xd  x  |J  {pi  +  A kD{z  £  Xc  :  z  £  zf1},  (6.3) 

where  Dz  denotes  a  set  of  positive  spanning  directions  corresponding  to  the  z-th  set  of 
discrete  variable  values.  Each  set  D1  (which  is  represented  as  a  matrix  whose  columns 
are  members  of  the  set)  is  constructed  in  exactly  the  same  manner  as  described  in 
Section  5.1,  and  is  often  written  as  D(x)  to  indicate  that  it  corresponds  to  the  discrete 
variable  values  of  x  £  X . 
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The  set  of  discrete  neighbors  is  again  constructed  in  terms  of  a  set- valued  function 
J\f  :  X  —>  2X ,  with  the  convention  that,  for  each  x  G  X,  x  E  J\f(x),  which  is  finite. 
Continuity  of  J\f  (see  Definition  5.2)  will  be  assumed  at  certain  limit  points  of  the 
algorithm  in  order  to  obtain  some  of  the  convergence  results  that  follow. 

The  poll  set  centered  at  pk  is  defined  as 

Pk(pk)  =  {Pk  +  A k(d, 0 )eX:de  Dk(pk)},  (6.4) 

where  the  set  Dk(pk)  C  Di  for  some  1  <  i  <  w  is  a  set  of  positive  spanning  set  of 
poll  directions. 

Because  the  filter  seeks  a  better  point  with  respect  to  either  of  the  two  functions 
(the  objective  function  /  and  the  constraint  violation  function  h),  a  change  must 
be  made  to  the  rule  for  selecting  discrete  neighbors,  about  which  to  perform  an 
EXTENDED  POLL  step.  Recall  that  in  the  MGPS  algorithm,  extended  polling  is 
performed  around  any  discrete  neighbor  whose  objective  function  value  is  sufficiently 
close  to  that  of  the  current  iterate  ( i.e .,  “almost”  an  improved  mesh  point).  With 
the  addition  of  nonlinear  constraints  to  the  problem,  we  now  require  a  notion  of  a 
discrete  neighbor  “almost”  generating  a  new  incumbent  best  feasible  point  or  least 
infeasible  point. 

While  this  issue  has  by  no  means  a  single  workable  approach,  the  implementation 
here  has  the  desirable  property  of  being  a  generalization  of  the  one  described  in 
Chapter  5.  At  iteration  k,  let  =  f(pk)  denote  the  objective  function  value  of  the 
incumbent  best  feasible  point,  and  let  h{  =  h(p{)  be  the  constraint  violation  function 
value  of  the  incumbent  least  infeasible  point.  Given  current  poll  center  pk  and  user- 
specified  extended  poll  triggers  <  £  >  0  and  $  <  £  >  0  for  /  and  h,  respectively 
(where  f  is  a  positive  constant),  we  perform  an  EXTENDED  POLL  step  around  any 
discrete  neighbor  yk  G  N{pk)  satisfying  either  0  <  h[  <  h(yk )  <  min(/i(.  +  hmax), 
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or  h(yk)  =  0  with  f[  <  f(yk)  <  /f  +(;[.  The  extended  poll  triggers  and  can 
also  be  set  according  to  the  categorical  variable  values  associated  with  the  current 
poll  center,  but  this  dependency  is  not  included  in  the  notation,  so  as  not  to  obfuscate 
the  ideas  presented  here. 

Similar  to  the  GMVP  algorithm  of  Chapter  5,  the  EXTENDED  poll  STEP  gen¬ 
erates  a  sequence  of  EXTENDED  POLL  centers  {yi}ji0)  beginning  with  yk  =  yk  and 
ending  with  extended  poll  endpoint ,  yJkk  =  zk. 

Thus,  at  iteration  k,  the  set  of  all  points  evaluated  in  the  EXTENDED  POLL  step, 
denoted  <%*(§£,££),  is 

M&d)  =  U  £(y)  (6.5) 

1/eW/uA/J 

where  S(y)  denotes  the  set  of  extended  poll  points,  and 

=  {ye  Af(pk)  :  h(y)  =  0  ,f[  <  f(y)  <  f[  +  $£},  (6.6) 

K  =  {V  G  MtPk)  :  0  <h[<  h(y)  <  min (hJk  +  hmax)}.  (6.7) 

The  set  of  trial  points  is  defined  as  Tk  =  Sk  U  Pk  (pk )  UN(pk )  U  Xk  (^ ,  ^(? ) ,  where 
is  the  finite  set  of  mesh  points  evaluated  during  the  SEARCH  step.  Similar  to  the  Filter 
GPS  algorithm  for  NLP  problems,  described  in  Section  4.2,  the  following  definitions 
define  the  two  outcomes  of  the  SEARCH,  POLL,  and  EXTENDED  POLL  steps. 

Definition  6.1  Let  Tk  denote  the  set  of  trial  points  to  be  evaluated  at 
iteration  k,  and  let  Tk  denote  the  set  of  filtered  points  described  by  (4.8). 

A  point  y  £  Tk  is  said  to  be  an  unfiltered  point  if  y  $  Tk. 

Definition  6.2  Let  Pk{pk)  denote  the  poll  set  centered  at  the  point  pk, 
and  let  Tk  denote  the  set  of  filtered  points  described  by  (4.8).  The  point  pk 
is  said  to  be  a  mesh  isolated  filter  point  if  Pk(Pk)k)Af(pk)UX(£l,£k)  c  ~Tk. 

If  Pk  e  {pk,pk},  then  pk  is  said  to  be  a  mesh  isolated  poll  center. 
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Figure  6.1  depicts  a  filter  on  a  bi-loss  graph,  in  which  the  best  feasible  and  least 
infeasible  solutions  are  indicated,  and  the  feasible  solutions  lie  on  the  vertical  axis 
(labelled  /).  The  dashed  lines  have  been  added  to  indicate  the  areas  for  which  an 
extended  POLL  step  is  triggered.  If  a  feasible  discrete  neighbor  has  an  objective 
function  value  higher  on  the  axis  than  the  current  incumbent,  but  lower  than  the 
horizontal  dashed  line,  an  EXTENDED  POLL  step  is  performed  around  this  discrete 
neighbor.  Similarly,  an  extended  poll  step  is  performed  if  a  filtered  discrete  neigh¬ 
bor  lies  to  the  right  of  the  current  least  infeasible  solution,  but  left  of  the  vertical 
dashed  line. 


/ 
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Figure  6.1  MVP  Filter  and  Extended  Poll  Triggers. 

Recall  that  in  the  Filter  GPS  algorithm  presented  in  Chapter  4,  the  goal  of  each 
iteration  is  to  find  an  unfiltered  point.  This  is  the  case  here  as  well,  but  the  details 
of  when  to  continue  an  EXTENDED  POLL  step  must  be  generalized  from  the  simple 
decrease  condition  in  /  under  which  the  MGPS  algorithm  operates.  More  specifically, 
if  the  EXTENDED  POLL  step  finds  an  unfiltered  point,  it  is  added  to  the  filter,  the  poll 
center  is  updated  (if  appropriate),  and  the  mesh  is  coarsened  according  to  the  rule  in 
(4.5).  If  the  EXTENDED  POLL  step  fails  to  find  a  new  point  y  satisfying  y  €  A/"/  UA/jf, 
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then  the  current  incumbent  poll  center  pk  is  declared  to  be  a  mesh  isolated  filter 
point,  the  current  poll  center  is  retained,  and  the  mesh  is  refined  according  to  the 
rule  in  (4.6).  However,  the  unanswered  question  is  how  to  treat  extended  poll  points 
that  are  filtered,  yet  still  belong  to  J\f[  or  N£. 

In  order  to  handle  this  case,  we  establish  the  notion  of  a  temporary  local  filter.  At 
iteration  k ,  for  each  discrete  neighbor  yk,  a  local  filter  Fk{yk)  is  simply  a  filter  relative 
to  the  current  EXTENDED  POLL  step.  It  is  populated  initially  with  only  the  point 
Vk  and  with  h£iax  =  min  (hi  +£k,hmax).  As  with  the  MGPS  algorithm,  the  extended 
poll  sequence  {yljjt x  begins  with  yk  =  yk  and  ends  with  zk  =  yJkk ,  where  each  y{  is 
the  poll  center  of  the  local  filter  -  chosen  either  as  the  best  feasible  or  least  infeasible 
point,  relative  to  the  local  filter.  Extended  polling  with  respect  to  yk  proceeds,  with 
points  being  added  to  the  local  filter  as  appropriate,  until  no  more  unfiltered  mesh 
points  can  be  found  with  respect  to  the  new  local  filter,  or  until  an  unfiltered  point 
is  found  with  respect  to  the  main  filter.  Once  either  of  these  conditions  is  satisfied, 
the  extended  poll  step  ends,  and  the  main  filter  is  appropriately  updated  with 
the  points  of  the  local  filter,  which  is  then  discarded.  The  mesh  size  parameter  Ak, 
which  has  been  kept  constant  throughout  the  step,  is  then  updated,  depending  on 
the  success  of  the  SEARCH,  POLL,  and  EXTENDED  POLL  steps  in  finding  an  unfiltered 
point  with  respect  to  the  main  filter. 

The  extended  POLL  step  is  summarized  by  the  algorithm  in  Figure  6.2,  and  the 
FMGPS  Algorithm  is  summarized  in  Figure  6.3 

6.2  Convergence  Results 

The  convergence  properties  of  the  new  algorithm  are  now  presented  in  four  sections. 
First,  the  behavior  of  the  mesh  size  parameter  Ak  will  be  shown  to  have  the  same 
behavior  as  previous  results,  and  a  general  characterization  of  limit  points  of  certain 
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Extended  Poll  Step  at  Iteration  k 

Input:  Current  poll  center  pk,  filter  Tk,  and  extended  poll  triggers  £fk  and  £*. 

For  each  discrete  neighbor  yk  G  U  M%  (see  (6.6)  and  (6.7)),  do  the  following: 

•  Initialize  local  filter  Tk  with  yk  and  hfnax  =  min{h!k  +  £k,  hmax}.  Set  yk  =  yk. 

•  For  j  =  0,1,2,... 

1.  Evaluate  /  and  h  at  points  in  Pk(yl)  until  a  point  w  is  found  that  is 
unfiltered  with  respect  to  or  until  done. 

2.  If  no  point  w  G  Pk(yk )  is  unfiltered  with  respect  to  Fk,  then  go  to  Next. 

3.  If  a  point  w  is  unfiltered  with  respect  to  Tk,  set  xk+i  =  w  and  Quit. 

4.  If  w  is  filtered  with  respect  to  Tk,  but  unfiltered  with  respect  to  then 
update  Tk  to  include  w,  and  compute  new  extended  poll  center  y{+1. 

•  Next:  Discard  Pk  and  process  next  yk  G  M[  U  J\fk  .  _ 

Figure  6.2  Extended  Poll  Step  for  the  FMGPS  Algorithm 

subsequences  is  given.  Results  for  the  constraint  violation  function  and  for  the  ob¬ 
jective  function  follow,  similar  to  those  found  in  [7],  Finally,  stronger  results  for  a 
more  specific  implementation  of  the  new  algorithm  are  provided.  These  mimic  those 
found  in  Section  5.3.4,  but  apply  to  the  more  general  MVP  problem  with  nonlinear 


constraints. 
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Filter  Mixed  Variable  Generalized  Pattern  Search  -  FMGPS 

Initialization:  Let  xq  be  an  undominated  point  of  a  set  of  initial  solutions.  Include 
all  these  points  in  the  filter  T0,  with  hmax  >  h(x0).  Fix  £  >  0  and  A0  >  0. 

For  k  =  0, 1, 2, ,  perform  the  following: 

1.  Update  poll  center  pk  6  {pf  ,p[.},  extended  poll  triggers  >  £  and  $  >  £. 

2.  Compute  incumbent  values  f[  =  f(pk),  h[  =  h{p[),  f[  =  f(p[). 

3.  SEARCH_step:  Employ  some  finite  strategy  seeking  an  unfiltered  mesh  point 
Xk+ i  ^  -Ffc- 

4.  Poll  step:  If  the  search  step  did  not  find  an  unfiltered  point,  evaluate  / 
and  h  appoints  in  the  poll  set  Pk(pk)  U  Af(pk)  until  an  unfiltered  mesh  point 
xk+ 1  ^  Fk  is  found,  or  until  done. 

5.  Extended  Poll  step:  If  search  and  poll  did  not  find  an  unfiltered  point, 
execute  the  algorithm  in  Figure  6.2  to  continue  looking  for  xk+l  £  ~Tk. 

6.  Update:  If  SEARCH,  POLL,  or  EXTENDED  POLL  finds  an  unfiltered  point, 
Update  filter  Pk+l  with  xk+1 ,  and  set  Afc+1  >  A*,  according  to  (4.5); 
Otherwise,  set  J-k+\  —  J-k,  and  set  A*..+1  <  A^  according  to  (4.6). 


Figure  6.3  FMGPS  Algorithm 


We  make  the  following  assumptions,  consistent  with  those  of  previous  GPS  algo¬ 
rithms: 

Al:  All  iterates  {xfc}  produced  by  the  FMGPS  algorithm  lie  in  a  compact  set. 

A2:  For  each  fixed  set  of  discrete  variable  values  xd,  the  corresponding  set  of  di¬ 
rections  Dl  =  GiZ{,  as  defined  in  (5.3),  includes  tangent  cone  generators  for  every 
point  in  Xc(xd). 

A3:  The  rule  for  selecting  directions  Dk  conforms  to  Xc  for  some  e  >  0  (see 
Definition  4.3). 

A4:  The  discrete  neighbors  always  lie  on  the  mesh;  i.e.,  A f(xk)  C  Mk  for  all  k. 
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6.2.1  Mesh  Size  Behavior  and  Limit  Points 

Since  the  behavior  of  the  mesh  size  is  governed  by  Assumptions  A1-A4,  the  following 
three  results  are  identical  to  Lemmas  5.7  and  5.8,  and  Theorem  5.9  in  Chapter  5. 
The  proofs  are  not  repeated  here. 


Lemma  6.3  For  any  integer  k  >  0,  let  u  and  v  be  any  distinct  points  in 
the  mesh  Mk  such  that  ud  —  vd.  Then  for  any  norm  for  which  all  nonzero 
integer  vectors  have  norm  at  least  1, 


\\uc-vc\\  > 


A*, 

IIgt1!!' 


where  the  index  i  corresponds  to  the  discrete  variable  values  of  u  and  v. 


Lemma  6.4  There  exists  a  positive  integer  ru  such  that  Afc  <  A 0rr 
for  any  k  >  0. 

Theorem  6.5  The  mesh  size  parameters  satisfy  liminf  Afe  =  0. 

fc— +-foo 

We  will  again  be  interested  in  refining  subsequences;  thus  the  following  two  def¬ 
initions  are  provided.  The  first  is  identical  to  Definition  4.15,  while  the  second  is 
modified  from  that  of  Definition  4.16  to  include  additional  limit  points. 

Definition  6.6  A  subsequence  of  mesh  isolated  poll  centers  {pk}k€K  (f°r 
some  subset  of  indices  K)  is  said  to  be  a  refining  subsequence  if  {A k}keK 
converges  to  zero. 


Definition  6.7  Let  {ufc}fce/c  be  either  a  refining  subsequence  or  a  cor¬ 
responding  subsequence  of  extended  poll  endpoints,  and  let  v  be  a  limit 
point  of  the  subsequence.  A  direction  d  €  D  is  said  to  be  a  limit  direction 
of  i)  if  vk  +  A k(d,  0)  belongs  to  X  and  is  filtered  for  infinitely  many  k  G  K. 
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The  following  theorem  of  Audet  and  Dennis  [8]  establishes  the  existence  of  limit 
points  of  specific  subsequences  of  interest.  Its  proof  is  omitted,  since  it  is  essentially 
identical  to  that  of  Theorem  5.12. 

Theorem  6.8  There  exists  a  point  p  £  {x  £  X  :  f(x)  <  f{x0)}  and 
a  refining  subsequence  {pk}keK  (with  associated  index  set  K)  such  that 
lim  Pk  =  p-  Moreover,  if  AT  is  continuous  at  p,  then  there  exists  y  £  A f(p) 

/c£/C 

and  z  =  (zc,yd)  £  X  such  that 

lim  yk  =  fj  and  lim  zk  —  z, 

keK  k&K 

where  each  zk  £  X  is  the  endpoint  of  the  EXTENDED  POLL  step  initiated 
at  yk  £  Af(pk). 

The  notation  in  Theorem  6.8  describing  specific  subsequences  and  their  limit 
points  will  be  retained  and  used  throughout  the  remainder  of  this  chapter. 

6.2.2  Results  for  the  Constraint  Violation  Function 

Theorem  6.8  defines  and  establishes  existence  of  the  limit  points  p,  y,  and  z.  While 
the  next  result  applies  to  more  general  limit  points  of  the  algorithm,  the  remainder 
of  the  results  in  this  section  apply  to  these  specific  limit  points.  This  pattern  will  be 
repeated  in  Section  6.2.3  as  well.  This  first  result,  which  is  similar  to  a  theorem  in  [6] 
for  /,  requires  a  very  mild  condition  on  h.  Note  that  this  result  will  not  hold  for  / 
without  an  additional  assumption  because  there  is  no  guarantee  that  any  subsequence 
of  objective  function  values  is  nonincreasing. 

Theorem  6.9  If  h  is  lower  semicontinuous  with  respect  to  the  continu¬ 
ous  variables  at  a  limit  point  p  of  poll  centers  {pk},  then  lim*  h(pk)  exists 
and  is  greater  than  or  equal  to  h(p)  >0.  If  h  is  continuous  at  every  limit 
point  of  {pfc},  then  every  limit  point  has  the  same  function  value. 
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Proof.  If  h(p)  =  0,  the  result  follows  trivially.  Now  let  h(p)  >  0.  Then  p  is  a  limit 
point  of  a  sequence  of  least  infeasible  points  p{,  which  is  monotonically  nonincreasing. 
Since  h  is  lower  semicontinuous  at  p,  we  know  that  for  any  subsequence  {pk)k&K  of 
poll  centers  that  converges  to  p,  liminffce/c  h(pk)  >  h(p)  >  0.  But  the  subsequence  of 
constraint  violation  function  values  at  p[  is  a  subsequence  of  a  nonincreasing  sequence. 
Thus,  the  entire  sequence  is  also  bounded  below  by  h(p),  and  so  it  converges.  ■ 
We  now  characterize  the  limit  points  of  Theorem  6.8  with  respect  to  the  constraint 
violation  function  h.  The  following  theorem  establishes  the  local  optimality  of  h  at 
p  with  respect  to  its  discrete  neighbors.  The  proof  is  nearly  identical  to  that  of 
Theorem  5.13. 

Theorem  6.10  Let  p  and  y  £  A f(p)  be  defined  as  in  the  statement  of 
Theorem  6.8,  with  Af  is  continuous  at  x.  If  h  is  lower  semi-continuous  at  p 
with  respect  to  the  continuous  variables  and  continuous  at  y  with  respect 
to  the  continuous  variables,  then  fi(p)  <  h(y). 

Proof.  Prom  Theorem  5.12,  we  know  that  {pk}kei<  converges  to  p  and  {yk}ka< 
converges  to  y.  Since  k  £  K  ensures  that  {pk } fee k  are  mesh  isolated  poll  centers,  we 
have  h(pk )  <  h(yk)  for  all  k  £  K,  and  by  the  assumptions  of  continuity  and  lower 
semi-continuity,  we  have  h(p)  <  limfcg^  h(pk)  <  limfeg/f  h(yk)  =  h(y).  ■ 

The  next  two  results  establish  a  directional  optimality  condition  for  h  at  p  and  at 
certain  z  with  respect  to  the  continuous  variables. 

Theorem  6.11  Let  p  be  a  limit  point  of  a  refining  subsequence.  Under 
Assumptions  A1-A4,  if  h  is  Lipschitz  near  p  with  respect  to  the  continuous 
variables,  then  fi°(p;  (d,  0))  >  0  for  all  limit  directions  d  £  D(p)  of  p. 
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Proof.  Let  {pk}keK  be  a  refining  subsequence  with  limit  point  p  and  let  d  G  D(p) 
be  a  limit  direction  of  p).  Prom  Definition  3.9,  we  have 

A"(p;(d,0))  =  limsup/t(j/  +  t(rf’0))~,i(!/)  >  limsup  fc(pt  +  ^k{i' 0))  - . 
y—p,  qo  t  keK  Afc 

The  function  h  is  Lipschitz,  hence  finite,  near  p.  Since  points  that  are  infeasible  with 

respect  to  X  are  not  evaluated  by  the  algorithm,  the  assumption  of  d  being  a  limit 

direction  of  p  ensures  that  infinitely  many  right-hand  quotients  are  defined.  All  of 

these  quotients  must  be  nonnegative,  or  else  the  corresponding  POLL  step  would  have 

found  an  unfiltered  point,  a  contradiction.  m 

Theorem  6.12  Let  p,  y  €  A/"(p),  and  z  be  defined  as  in  the  statement 
of  Theorem  6.8,  with  M  continuous  at  p,  and  let  £  >  0  denote  a  lower 
bound  on  the  extended  poll  triggers  and  for  all  k.  If  h(y)  <  h(p)+£ 
and  h  is  Lipschitz  near  z  with  respect  to  the  continuous  variables,  then 
h°(z ;  (d,  0))  >  0  for  all  limit  directions  d  G  D(z)  of  z. 

Proof.  From  Definition  3.9,  we  have 

h°(z,(i,  0))  =  limsup,l(y  +  t((i’0))~/W  >  limsup  H**  +  A*(d,0))  ~  Hzt) 
y->z,  qo  t  *.£/<-  Afc 

Now,  h  is  Lipschitz,  hence  finite,  near  £.  Since  h{y)  <  h(p)  +  ^  ensures  that  extended 
polling  was  triggered  around  yk  G  A f{pk)  for  all  k  G  K,  and  since  d  is  a  limit  direction 
of  z ,  it  follows  that  zk  +  Ak(d,0)  G  X  infinitely  often  in  K,  and  infinitely  many  of  the 
right-hand  quotients  are  defined.  All  of  these  quotients  must  be  nonnegative,  since 
for  k  G  K,  zk  is  an  extended  poll  endpoint.  ■ 

For  bound  or  linear  constraints,  in  order  to  guarantee  the  existence  of  limit  direc¬ 
tions,  for  which  Theorem  6.11  applies,  each  D{  C  D,  i  =  1, 2, . . .  ,2max  is  constructed 
in  accordance  with  the  algorithm  given  in  Figure  4.3  to  generate  a  sufficiently  rich 
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set  of  directions  to  ensure  conformity  to  Xc  (see  Definition  4.3),  consistent  with 
Assumption  A3. 

The  next  two  key  theorems  establish  conditions  on  h  at  p  and  certain  z  to  satisfy  a 
first-order  optimality  condition  with  respect  to  the  continuous  variables.  Recall  that 
the  meaning  of  D-regularity  is  given  in  Definition  3.17. 

Theorem  6.13  Let  p  be  the  limit  of  a  refining  subsequence  with  limit 
directions  D(p),  and  suppose  h  satisfies  the  following  conditions: 

1 .  h  is  Lipschitz  near  p  with  respect  to  the  continuous  variables, 

2.  h  is  differentiable  at  p  with  respect  to  the  continuous  variables, 

3.  h  is  jD(p)-regular  at  p  with  respect  to  the  continuous  variables. 

Then  under  Assumptions  A1-A3,  Vch(p)Tw  >  0  for  all  w  G  Tx<=(p). 
Moreover,  if  Xc  =  9ft"c,  or  if  p  lies  in  the  interior  of  Xc,  then  0  =  Vc/i(p)  e 
dch(p). 

Proof.  Since  Assumption  A3  ensures  that  the  rule  for  selecting  Dk  conforms  to  Xc 
for  some  e  >  0,  and  since  there  are  finitely  many  linear  constraints,  D(pk)  — >  D(p), 
and  D(p)  positively  spans  TXc{p).  Theorem  6.11  guarantees  that  h°(p,  (d,  0))  >  0  for 
all  d  e  D(p),  and  the  result  follows  directly  from  Theorem  3.21.  ■ 

Theorem  6.14  Let  p,y  €  A f(p),  and  z  be  defined  as  in  the  statement  of 
Theorem  6.8,  with  M  continuous  at  p,  and  let  £  >  0  denote  a  lower  bound 
on  the  extended  poll  triggers  and  for  all  k.  Let  D(z)  denote  the 
limit  directions  of  z,  and  suppose  h  satisfies  the  following  assumptions: 

1.  A.  is  Lipschitz  near  z  with  respect  to  the  continuous  variables, 

2.  h  is  differentiable  at  z  with  respect  to  the  continuous  variables, 
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3.  h  is  D(£)-regular  at  z  with  respect  to  the  continuous  variables. 

If  h(fj)  <  h(p)  +  £,  then  under  Assumptions  A1-A4,  Xch(z)Tw  >  0  for 
all  w  £  TXc{z).  Furthermore,  if  Xc  =  3?"°  or  zc  lies  in  the  interior  of  Xc, 
then  0  =  X7ch(z)  £  dch(z ). 

Proof.  Since  Assumption  A3  ensures  that  the  rule  for  selecting  Dk  conforms  to  Xc 
for  some  e  >  0,  and  since  there  are  finitely  many  linear  constraints,  D(zk)  — >  D(z), 
and  D(z)  positively  spans  TXc(z).  Theorem  6.12  ensures  that  h°(z;  (d,  0))  >  0  for  all 
d  £  D(z),  and  the  result  follows  directly  from  Theorem  3.21.  ■ 

Recall  that  in  Theorem  5.18,  we  showed  that,  for  any  discrete  neighbor  y  £  A f(p) 
satisfying  f{y)  =  f(p),  we  have  f(y)  =  f(p ),  where  y  is  any  limit  point  of  the 
sequence  of  extended  poll  points  {y{k} keK.  The  following  theorem  shows  a  similar 
result  for  h,  but  requires  the  limit  points  p  and  y  to  be  infeasible.  The  result  does 
not  hold  if  the  limit  points  are  feasible  because  there  is  no  guarantee  that  y  must  be 
feasible,  particularly  if  the  extended  poll  centers  are  chosen  infinitely  often  to  be  least 
infeasible  points  with  respect  to  the  local  filter. 

Theorem  6.15  Let  p,  y  £  A f(p),  and  z  be  defined  as  in  the  statement 
of  Theorem  6.8,  with  A f  continuous  at  p,  and  suppose  h(y)  =  h{p )  >  0.  If 
h  is  continuous  at  p  and  y  with  respect  to  the  continuous  variables,  then 
under  Assumptions  A1-A4,  either  h(y)  =  0  or  h(y)  =  h(p)  for  any  limit 
point  y  =  ( yc,yd )  of  the  sequence  of  extended  poll  centers  {pf  }keK- 

Moreover,  if  y  ±  z ,  then  there  are  infinitely  many  of  these  limit  points  y, 
and  the  index  Jk  of  the  extended  poll  endpoint  yJkk  at  iteration  k  satisfies 
Jk  — >  +oo  (in  K). 

Proof.  Let  y  in  J\f(p)  satisfy  h(y)  =  h(p).  Let  y  be  a  limit  point  of  the  sequence  of 
extended  poll  points  {y(k}keK,  distinct  from  y  and  z. 
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If  h{p)  >  0,  then  the  hypothesis  that  h(p)  =  h{y)  >  0  means  that,  for  all  suffi¬ 
ciently  large  k  <E  K,  we  have  0  <  h(p )  <  h(y{k )  <  for  j  =  1, 2, ,  Jjt.  Since 

the  subsequence  {h(yk)}keK  converges  to  h(p),  we  conclude  that  h(p)  =  h(y).  The 
remainder  of  the  proof  is  identical  to  that  of  Theorem  5.18.  ■ 

In  section  6.2.4,  we  will  show  a  stronger  version  of  the  algorithm  which  will  guar¬ 
antee  for  y  the  same  optimality  conditions  for  h  that  apply  to  p  and  z. 

6.2.3  Results  for  the  Objective  Function 

We  now  address  the  properties  of  certain  limit  points  with  respect  to  the  objective 
function  /.  Unfortunately,  in  order  to  obtain  results  for  /  that  are  similar  to  those 
for  h,  an  additional  hypothesis  must  be  added  to  most  of  the  theorems  that  follow. 
Additionally,  convergence  to  a  KKT  point  (with  respect  to  the  continuous  variables) 
cannot  be  guaranteed,  but  we  will  show  a  similar  result  to  Theorem  4.20,  in  which  a 
cone  is  identified  whose  polar  contains  the  normal  cone. 

The  first  result,  under  very  mild  conditions,  is  similar  to  previous  results  (see 
Theorems  5.6  and  6.9),  but  requires  polling  to  be  centered  at  the  best  feasible  point 
at  all  but  finitely  many  iterations. 

Theorem  6.16  Under  Assumption  Al,  there  exists  at  least  one  limit 
point  p  of  the  iteration  sequence  {pk}  of  poll  centers.  If  /  is  lower  semi- 
continuous  at  p  with  respect  to  the  continuous  variables,  h  is  continuous  at 
p  with  respect  to  the  continuous  variables,  and  Pk  =  pf  for  all  but  finitely 
many  k,  then  lim*,  f(pk)  exists  and  is  greater  than  or  equal  to  f(p),  which 
is  finite.  If  /  is  continuous  at  every  limit  point  of  {pk},  then  every  limit 
point  has  the  same  function  value. 
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Proof.  First,  pk  =  pf ,  hence  h(pk)  =  0,  for  all  but  finitely  many  k.  Thus,  /  is  nonin¬ 
creasing,  for  all  sufficiently  large  k.  Since  /  is  lower  semicontinuous  at  p,  we  know  that 
for  any  subsequence  {pk}ksK  of  poll  centers  converging  to  p,  liminffce/r  f{pk)  >  f(p)- 
But  the  subsequence  of  function  values  is  a  subsequence  of  a  nonincreasing  sequence 
(for  sufficiently  large  k).  Thus,  for  sufficiently  large  k,  the  sequence  is  also  bounded 
below  by  /(p),  and  so  it  converges.  m 

The  remainder  of  this  section  contains  results  for  the  limit  points  described  by 
Theorem  6.8.  Each  theorem  contains  an  additional  necessary  hypothesis  that,  for 
infinitely  many  iterations  of  the  specified  subsequence,  trial  points  must  be  filtered 
by  the  poll  center  (or  extended  poll  endpoint),  rather  than  a  different  filter  point. 

The  following  result,  which  is  similar  to  Theorems  5.13  and  6.10,  establishes  op¬ 
timality  conditions  with  respect  to  the  discrete  set  of  neighbors. 

Theorem  6.17  Let  p  and  y  6  N{f>)  be  defined  as  in  the  statement  of 
Theorem  6.8,  with  M  continuous  at  x.  If  /  is  lower  semi-continuous  at  p 
and  y  with  respect  to  the  continuous  variables,  and  if  f(pk)  <  f{yk )  for 
infinitely  many  k  €  K,  then  /(p)  <  f(y). 

Proof.  From  Theorem  6.8,  we  know  that  {pk}keK  converges  to  p  and  {yk}keK 
converges  to  y.  Without  loss  of  generality,  we  may  assume  that  h(pk )  <  hmax  for 
all  k  e  K.  Then,  since  f{pk)  <  f{yk )  for  infinitely  many  k  G  K,  we  have  by  the 
assumptions  of  continuity  and  lower  semi-continuity,  that  /(p)  <  lim*.e/c  f(pk)  < 
lim ktK  f{Vk)  =  f{y)-  m 

The  next  two  results  establish  conditions  under  which  certain  Clarke  generalized 
directional  derivatives  are  nonnegative.  The  first  theorem  applies  to  p,  while  the 
second  applies  to  some  z.  As  before,  these  theorems  require  the  additional  hypothesis 
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that  the  incumbent  poll  center  or  extended  poll  endpoint,  rather  than  a  different  filter 
point,  filter  the  trial  points  infinitely  often  in  the  subsequence. 

Theorem  6.18  Let  p  be  a  limit  point  of  a  refining  subsequence  {pk}keK, 
and  let  d  G  D  be  a  limit  direction  of  p.  Under  Assumptions  A1-A4, 
if  /  is  Lipschitz  near  p  with  respect  to  the  continuous  variables,  and 
f(pk)  <  f(pk  +  Ak(d,  0))  for  infinitely  many  ke  K,  then  /°(p;  (d,  0))  >  0. 


Proof.  From  Definition  3.9,  we  have  that 

r  /(y  +  t(d,  0))  - /(y) 
f°(p;(d,0))  =  lim  sup - 

y— ►p,  qo 


>  lim  sup 

k€K 


f  ijpk  +  Ak(d,  0))  —  f{pk) 


t  keK  *  &k 

which  is  nonnegative,  since  an  infinite  number  of  terms  in  the  right-hand  quotient  are 
nonnegative.  ■ 


Theorem  6.19  Let  p,  y  6  J\f(p),  z,  and  zk  be  defined  as  in  the  statement 
of  Theorem  6.8,  with  M  continuous  at  p,  and  let  dell  be  a  limit  direction 
for  z.  Suppose  that  f(y)  <  f(p)  +  where  ^  >  0  is  a  lower  bound  on 
the  extended  poll  triggers  ^  and  for  all  k.  Under  Assumptions  Al- 
A4,  if  /  is  Lipschitz  near  z  with  respect  to  the  continuous  variables,  and 
f(zk)  <  f{zk  +  A k(d,  0))  for  infinitely  many  k  €  K,  then  f°(z\  (d,  0))  >  0. 


Proof.  From  Definition  3.9,  we  have  that 


f°(z;(d,  0)) 


=  lim  sup 

y->z,  i|0 


f(y  +  t{d,  0  ))-f{y) 
t 


>  lim  sup 

kGK 


f(zk  +  Afc(d,0))  -f(zk) 
A, 


which  is  nonnegative,  since  an  infinite  number  of  terms  in  the  right-hand  quotient  are 
nonnegative.  ■ 

The  next  two  results  describe  the  optimality  conditions  for  /  at  p  and  at  certain  z 
under  the  assumptions  of  local  Lipschitz  continuity,  differentiability,  and  D-regularity 
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(see  Definition  3.17)  for  a  certain  set  of  directions  D.  Once  again,  these  theorems 
require  that  trial  points  be  filtered  by  the  poll  center  (rather  than  a  different  filter 
point)  infinitely  often  in  the  subsequence. 

As  is  the  case  with  the  Filter  GPS  algorithm,  convergence  to  a  KKT  point  cannot 
be  guaranteed  with  respect  to  the  continuous  domain,  since  there  is  no  guarantee 
that  the  negative  gradient  lies  inside  the  normal  cone;  however,  we  can  specify  a  cone 
inside  the  tangent  cone,  whose  polar  contains  the  negative  gradient. 

Theorem  6.20  Let  p  be  a  limit  point  of  a  refining  subsequence  {pk}keK, 
and  let  Vd  be  the  cone  generated  by  all  limit  directions  d  G  D  of  p,  for 
which  f(pk)  <  f(pk  +  Akd )  holds  infinitely  often.  Suppose  that  /  satisfies 
the  following  conditions: 

1-  /  is  Lipschitz  near  f>  with  respect  to  the  continuous  variables, 

2.  /  is  differentiable  at  p  with  respect  to  the  continuous  variables, 

3.  /  is  D-regular  at  p  with  respect  to  the  continuous  variables. 

Then  under  Assumptions  A1-A4,  -Vc/(p)  belongs  to  the  polar  V^°  of  Vd. 

Proof.  By  Theorem  6.18,  /°(p;  (d,  0))  >  0  for  all  d  G  Vd,  and  by  Theorem  3.21,  we 
have  Vcf(p)Tw  >  0  for  all  w  G  Vd.  The  result  follows  from  the  definition  of  a  polar 
cone:  —  Vc/(p)  G  G  5ft"  :  vTw  <  0  V  w  G  V^}.  ■ 

Theorem  6.21  Let  p,  y  G  A7(p),  i,  and  zk  be  defined  as  in  the  statement 
of  Theorem  6.8,  with  A f  continuous  at  p,  and  suppose  that  f(y)<f  (p)  , 
where  £  >  0  is  a  lower  bound  on  the  extended  poll  triggers  and  for 
all  k.  Let  Vd  be  the  cone  generated  by  all  limit  directions  d  G  £>  of  i,  for 
which  f(zk)  <  f(zk  +  A kd)  holds  infinitely  often.  Suppose  that  /  satisfies 
the  following  conditions: 
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1.  /  is  Lipschitz  near  z  with  respect  to  the  continuous  variables, 

2.  /  is  differentiable  at  5  with  respect  to  the  continuous  variables, 

3.  /  is  D-regular  at  z  with  respect  to  the  continuous  variables. 

Then  under  Assumptions  A1-A4,  —  Vc/(i)  belongs  to  the  polar  V£  of  Vd ■ 

Proof.  By  Theorem  6.19,  f°{z;  (d,  0))  >  0  for  all  d  G  Vd,  and  by  Theorem  3.21,  we 
have  S7cf(z)Tw  >  0  for  all  w  G  Vd.  The  result  follows  from  the  definition  of  a  polar 
cone:  — Vc/(z)  G  {v  G  9?”  :  vTw  <  0  V  w  G  Vd}-  ■ 

As  in  Section  6.2.2,  the  remaining  task  is  to  characterize  the  limit  point  y  of 
extended  poll  points  {y3kk }keK  when  f(y)  =  f(p).  However,  we  cannot  guarantee  that 
=  f(p)  in  general  because,  in  the  filter  approach,  we  do  not  necessarily  have 
f(vlk)  <  fiy'k  X)  for  all  j  =  1,2,. ..,  Jfc.  In  fact,  a  counterexample  is  found  in  [8]. 
Thus,  to  obtain  the  result  of  f(y)  =  f(p ),  we  must  add  this  as  a  hypothesis,  as  the 
following  theorem  shows. 

Theorem  6.22  Let  p,  y  e  A f(p),  and  z  be  defined  as  in  the  statement 
of  Theorem  6.8,  with  Af  continuous  at  p,  and  suppose  that  the  extended 
poll  centers  {p£}/=o  satisfy  fid)  ^  /(^fc"1).  3  =  2,  •  •  • ,  Jk  for  infinitely 

many  k  G  K.  Let  y  =  (yc,  yd)  denote  the  limit  point  of  the  sequence  of 
extended  poll  centers  {y{k}keK,  where  jk  is  arbitrary.  If  /  is  continuous  at 
p  and  y  with  respect  to  the  continuous  variables,  then  under  Assumptions 
A1-A4,  f(y)  =  /(p). 

Moreover,  if  y  ^  z,  then  there  are  infinitely  many  of  these  limit  points, 
and  the  index  Jk  of  the  extended  poll  endpoint  yJkk  at  iteration  k  satisfies 
Jfc  — »  +oo  (in  K). 
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Proof.  Let  y  in  Af(p)  satisfy  f(y)  =  /(p),  and  let  y  be  a  limit  point  of  the  sequence 
of  extended  poll  points  {y{k}keK,  distinct  from  y  and  i. 

Since,  by  assumption,  /(p)  <  f{y3k)  <  /(y£_1),  j  =  1,2,...,  Jk  for  infinitely  many 
ke  K  and  f{yl)  converges  to  /(p),  we  conclude  that  f(y)  =  f(p).  The  remainder  of 
the  proof  is  exactly  the  same  as  the  proof  of  Theorem  5.18.  ■ 

Remark  6.23  The  additional  hypothesis  in  Theorems  6.17-6.21  that 
the  trial  points  be  filtered  by  the  current  poll  center  infinitely  often  in 
the  associated  subsequence  are  automatically  satisfied  by  either  of  the 
following  two  conditions: 

1.  The  poll  center  (or  extended  poll  center)  is  chosen  to  be  the  incum¬ 
bent  best  feasible  point  (or  best  feasible  point  with  respect  to  the 
local  filter)  infinitely  often  in  the  subsequence;  i.e.,  pk  =  pf  (or  al¬ 
ternatively  zk  =  z[)  for  infinitely  many  k  e  K.  To  see  this  for 
Pk,  observe  that  pk  =  p[  infinitely  often  means  that  h(pk)  —  0  for 
infinitely  often,  and  since  these  pk  are  mesh  isolated  poll  centers, 
f{Pk )  <  /(Pfc  +  A kd)  for  all  limit  directions  d.  e  D  for  p.  Note 
that  pk  =  pf  is  chiefly  an  algorithmic  choice,  rather  than  a  problem- 
dependent  condition. 

2.  The  limit  point  is  strictly  feasible  with  respect  to  the  nonlinear  con¬ 
straints  C ,  and  C  is  continuous  at  the  limit  point.  This  holds  be¬ 
cause  these  two  conditions  ensure  that  for  all  sufficiently  large  k  6  K, 

Pk  =  Pk  ■  m 

Remark  6.24  If  there  are  no  nonlinear  constraints,  then  pk  =  pf  for  all 
k.  Thus,  the  results  presented  in  this  section  encompass  all  of  the  results 
of  Chapter  5. 
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Before  presenting  a  stronger  version  of  the  algorithm,  we  must  point  out  one  other 
key  result  that  we  adapt  from  Theorem  4.21,  whose  proof  is  adapted  from  [7]. 

Theorem  6.25  If  h  and  /  are  strictly  differentiable  at  poll  center  pk  with 
respect  to  the  continuous  variables,  and  if  Vc/(pfc)  ^  0,  then  there  cannot 
be  infinitely  many  consecutive  iterations  where  pk  is  a  mesh  isolated  poll 
center. 

Proof.  Let  /  and  h  be  strictly  differentiable  at  Pk  with  respect  to  the  continuous 
variables,  where  Vcf(pk )  ^  0.  Suppose  that  there  are  infinitely  many  iterations  where 
pk  is  a  mesh  isolated  filter  point.  Let  d  be  a  direction  associated  with  the  (constant) 
subsequence  of  poll  centers  such  that  Vcf(pk)Td  <  0. 

Since  /  is  strictly  differentiable  at  pk  with  respect  to  the  continuous  variables,  there 
exists  an  e  >  0  such  that  either  h(pk  +  A (d,  0))  <  h(pk)  <  hm ax,  or  h(pk  +  A(d,  0))  > 
h(pk),  for  all  0  <  A  <  e. 

If  the  first  condition  is  satisfied,  then  for  A*,  <  e,  the  POLL  step  will  find  an 
unfiltered  point,  a  contradiction.  If  the  second  condition  is  satisfied,  then  let  h  be 
the  smallest  value  of 


{ h(x )  :  h(x)  >  h(pk),  x  £  Fk}  U  {h  max}  7 

and  let  /  be  the  corresponding  objective  function  value;  i.e.,  either  /  =  f(x)  for  the 
vector  x  E  Fk  that  satisfies  h(x)  =  h,  or  f  =  — oo  in  the  case  where  h  =  hm ax.  It 
follows  that  h  >  h(ph)  and  /  <  /(pfc).  Therefore,  for  sufficiently  small  Ak  <  e,  we 
have  h(pk)  <  h(pk  +  A kd)  <  h  and  /  <  f{pk  +  A kd)  <  /(pfc);  thus,  the  trial  mesh 
point  is  unfiltered,  a  contradiction.  ■ 

A  limitation  of  this  result  is  that,  while  it  prevents  a  non-stationary  pk  from 
being  a  mesh-isolated  poll  center  for  infinitely  many  consecutive  iterations,  it  does 
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not  completely  prevent  the  algorithm  from  stalling  there.  The  algorithm  could  still 
generate  an  infinite  number  of  consecutive  iterations  in  which  pk  is  either  a  mesh- 
isolated  filter  point  or  a  filter  point  that  does  not  generate  a  new  poll  center.  If, 
for  example,  pk  simply  alternates  between  these  two  possibilities,  then  Theorem  6.25 
holds,  but  the  algorithm  still  stalls  at  pk. 

As  in  previous  results,  the  additional  hypothesis  of  pk  =  p[  for  infinitely  many 
k  £  K  would  fully  prevent  stalling  because  it  would  force  h(pk)  =  0  for  infinitely 
many  h  £  K,  and  the  strict  differentiability  of  /  at  pk  means  that  V cf(pk)d  <  0  for 
some  direction  d  £  Dk.  Thus,  for  sufficiently  large  k  £  K,  Afc  is  sufficiently  small  to 
force  f(pk  +  A kd)  <  f(pk),  and  the  algorithm  moves  to  a  new  point. 

6.2.4  Stronger  Results 

In  this  section,  we  present  a  modified  version  of  the  FMGPS  algorithm,  in  which 
stronger  results  are  gained  by  requiring  the  algorithm  to  evaluate  every  extended  poll 
point,  even  if  a  point  is  found  that  is  unfiltered  with  respect  to  the  local  filter.  The 
price  for  the  stronger  result  is  additional  computational  cost. 

Specifically,  the  STRONG  EXTENDED  POLL  step  involves  the  following  two  changes 
at  each  iteration  k: 

STRONG  EXTENDED  POLL  step  (at  iteration  k): 

•  For  j  =  1,2, ...,  Jk,  the  next  poll  center  y{+l  of  EXTENDED  POLL  step  j  is 
updated  the  local  filter  is  updated  with  all  unfiltered  continuous  neighbors  of 
the  current  extended  poll  center  y£. 

•  The  same  set  of  directions  Dk(yk)  C  D(yk)  must  be  used  throughout  the 
STRONG  EXTENDED  POLL  step. 
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The  two  theorems  below  give  the  optimality  results  for  h  at  y  under  different 
assumptions.  The  first  assumes  local  Lipschitz  continuity,  while  the  second  further 
assumes  differentiability  and  D-regularity  (see  Definition  3.17). 

Theorem  6.26  Let  p  and  y  £  M{p)  be  defined  as  in  the  statement  of 
Theorem  6.8,  with  M  continuous  at  p,  and  suppose  h(y)  =  h(p).  Let  y  be 
a  limit  point  of  strong  extended  poll  points  {yik}keK,  where  jk  is  arbitrary, 
and  let  d  £  D  be  any  limit  direction  of  y.  Under  Assumptions  A1-A4,  if 
h  is  Lipschitz  near  y,  then  h°(y ;  (d,  0))  >  0. 

Proof.  Suppose  by  way  of  contradiction  that  there  exists  a  d  £  D  such  that 
h°(y ;  (d,  0))  <  0.  Then  by  Definition  3.9,  we  have 

limsup  -  [h(y  +  t(d,  0))  —  h(y)\  <  0. 

y->y,tlo  t 

By  local  Lipschitz  continuity,  there  exists  an  e  >  0  such  that  h(y  +  t(d,  0))  <  h(y)  for 
all  t  £  (0,  e).  Furthermore,  without  loss  of  generality,  for  each  t  €  (0,  e),  there  exists 
an  e(  >  0  such  that  h(y  +  t(d ,  0)  +  s(b,  0))  <  h(y)  for  all  s  €  (0,  et)  and  for  all  b  G  9?n 
with  ||6||  =  1. 

Choose  k  e  K  sufficiently  large  such  that  \\y]kk  -  y\\  <  et  and  Afc  <  e.  Now 
observe  that  y{k  =  y  +  s(b,  0)  for  some  s  €  3?  and  b  £  W1  with  ||6||  =  1.  Then 
s  =  s||(S, 0)||  =  || ylk  -  y\\  <  et.  Use  of  the  STRONG  extended  poll  step  (with 
jk  <  Jk)  guarantees  that  h(ylk+1 )  <  h{y*k  +  Afc(d,  0))  =  h(y  +  s(b,  0)  +  Afe(d,  0))  = 
h(y  +  A k(d,  0)  +  s(b,  0))  <  h(y),  since  Afe  is  a  specific  case  of  t  £  (0,  e)  and  s  £  (0,  et). 
Thus,  h(ylk+1)  <  h(y)  =  h(p)  <  h(pk),  which  contradicts  the  assumption  that  pk  is  a 
mesh  isolated  poll  center.  ■ 

Theorem  6.27  Let  the  assumptions  of  Theorem  6.26  hold,  and  suppose 
that  h  satisfies  the  following  conditions: 


1.  h  is  Lipschitz  near  y  with  respect  to  the  continuous  variables, 

2.  h  is  differentiable  at  y  with  respect  to  the  continuous  variables, 

3.  h  is  D-regular  at  y  with  respect  to  the  continuous  variables. 
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Then  under  Assumptions  A1-A4,  Vch(y)Tw  >  0  for  all  w  e  TXc(y).  If 
Xc  =  or  if  y  lies  in  the  interior  of  Xc,  then  0  =  Vc/i(y)  G  dch(y). 

Proof.  Since  Theorem  6.26  applies  to  a  set  of  directions  that  form  a  positive  spanning 
set  for  TXc(y),  the  result  follows  directly  from  it  and  Theorem  3.21.  ■ 

Because  of  the  construction  of  the  filter,  equivalent  results  for  the  objective  func¬ 
tion  /  require  stronger  assumptions,  similar  to  those  in  the  results  of  Section  6.2.3. 

Theorem  6.28  Let  p  and  y  G  A f(p)  be  defined  as  in  the  statement  of 
Theorem  6.8,  with  A f  continuous  at  p ,  and  assume  that  f(y)  =  f(p).  Let 
y  be  a  limit  point  of  strong  extended  poll  centers  {y{k}k&K,  where  jk  is  ar¬ 
bitrary,  and  let  d  G  D  be  any  limit  direction  of  y.  Under  Assumptions  Al- 
A4,  if  /  is  Lipschitz  near  y  with  respect  to  the  continuous  variables,  and 
fil/t)  <  fiut  l)>  Jk  —  1,2 , . . . ,  Jfc,  holds  for  infinitely  many  k  G  K,  then 

o))>o. 

Proof.  Suppose  by  way  of  contradiction  that  there  exists  a  d  e  D  such  that 
f°(y](d,  0))  <  0.  Then  by  Definition  3.9,  we  have 

lim  sup  \  [f(y  +  t(d,  0))  -  f{y)}  <  0. 

By  local  Lipschitz  continuity,  there  exists  an  e  >  0  such  that  f(y  +  t(d,  0))  <  f(y)  for 
all  t  G  (0,  e).  Furthermore,  without  loss  of  generality,  for  each  t  G  (0,e),  there  exists 
an  et  >  0  such  that  f(y  +  t(d,  0)  +  s(b, 0))  <  f{y)  for  all  s  G  (0,  et )  and  for  all  b  G  3?" 
with  ||6||  =  1. 
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Choose  k  £  K  sufficiently  large  such  that  || y{k  -  y\\  <  et  and  Afc  <  e.  Now 
observe  that  y j,k  —  y  +  s(b,  0)  for  some  s  £  3?  and  b  £  with  ||6||  =  1.  Then 
s  =  s\\(bM  =  \\yik-y\\<^. 

Since,  by  assumption,  we  have  f(ylk)  <  f{y{k~l),  j  =  1,2,...,  Jk  for  infinitely 
many  k  £  K,  use  of  the  STRONG  EXTENDED  POLL  step  (with  jk  <  Jk)  guarantees 
that  f(yik+1)  <  /(s/fcfc+Afc(d,0))  =  /(y+s(&,0)+Afc(d,  0))  =  f(y+Ak(d,  0)+s(b,  0))  < 
f(y)  for  infinitely  many  k  £  K,  since  Afc  is  a  specific  case  of  £  €  (0,  e)  and  s  £  (0,  et). 
Thus,  f(y3kk+1)  <  f(V)  =  f(P)  <  /(Pfc),  which  contradicts  the  assumption  that  pk  is  a 
mesh  isolated  poll  center.  ■ 

Theorem  6.29  Let  the  assumptions  of  Theorem  6.28  hold,  and  let  Va  be 
the  cone  generated  by  all  limit  directions  d  £  D  of  y  for  which  f(y{k)  < 
f(y{k  +  A fcd),  jk  =  0, 1, ... ,  Jfc,  holds  for  infinitely  many  k  £  K.  If  / 
satisfies  the  assumptions  of  Theorem  6.20  for  y,  then  under  Assumptions 
A1-A4,  -Vc/(y)  belongs  to  the  polar  of  Vd. 

Proof.  By  Theorem  6.28,  /°(y;  (d,  0))  >  0  for  all  d  £  Vd,  and  by  Theorem  3.21,  we 
have  Vcf(y)Tw  >  0  for  all  w  £  Vd.  The  result  follows  from  the  definition  of  a  polar 
cone:  -Vc/(y)  G  {v  £  9£n  :  vTw  <  0  V  w  £  Vd).  ■ 

6.3  Summary 

The  FMGPS  Algorithm  presented  and  analyzed  here  represents  a  generalization  of 
much  of  the  work  of  Torczon  [130],  Lewis  and  Torczon  [94,  95],  and  Audet  and 
Dennis  [6,  7,  8].  That  is,  the  algorithms  described  in  these  papers  are  specific  instances 
of  the  FMGPS  class  of  algorithms,  and  the  convergence  theory  presented  here  applies 
to  a  broader  class  of  problems  that  includes  the  work  in  these  papers. 
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In  summary,  if  the  Assumptions  A1-A4  are  satisfied,  then  we  get  the  following 
hierarchy  of  convergence  results: 

•  If  h  is  lower  semi-continuous  at  any  limit  point  p  of  the  GPS  iteration  sequence, 
then  h(p)  <  lim kh{pk).  A  similar  result  for  /  requires  that  pk  =  pf  for  all  but 
finitely  many  k. 

•  Every  limit  point  of  the  iteration  sequence  at  which  h  is  continuous  has  the  same 
function  value  limfc  h(pk),  whether  or  not  it  is  a  stationary  point.  A  similar  result 
for  /  requires  that  pk  =  p[  for  all  but  finitely  many  k. 

•  Theorem  6.8  defines  and  establishes  the  existence  of  the  limit  points  p,  y,  and 
z,  a  limit  point  of  extended  poll  endpoints. 

•  If  h  is  lower  semicontinuous  at  p  with  respect  to  the  continuous  variables  and 
continuous  at  y  with  respect  to  the  continuous  variables,  and  if  N  is  continuous 
at  p,  then  h(p)  <  h(y).  A  similar  result  for  /  requires  that  yk  be  filtered  by  the 
current  poll  center  infinitely  often  in  the  subsequence. 

•  If  the  function  h,  is  Lipschitz  near  p  with  respect  to  the  continuous  variables, 
then  h°(p;  (d,  0))  >  0  for  any  limit  direction  d  of  p.  A  similar  result  holds  at  I. 
Similar  results  for  /  at  p  and  5  require  that  pk  +  A kd  (or  zk  +  A kd)  be  filtered 
by  the  current  poll  center  infinitely  often  in  the  subsequence. 

•  If  h  is  strictly  differentiable  (or  its  equivalent  on  the  tangent  cone)  at  p  with 
respect  to  the  continuous  variables,  then  p  is  a  first  order  stationary  point  with 
respect  to  the  continuous  variables.  A  similar  result  holds  at  certain  £. 

•  If  /  is  strictly  differentiable  (or  its  equivalent  on  the  tangent  cone)  at  p  with 
respect  to  the  continuous  variables,  then  the  polar  of  the  cone  formed  by  certain 
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limit  directions  of  p  contains  -V/(p).  While  a  first  order  stationary  cannot  be 
guaranteed  in  general,  it  can  be  if  p  is  strictly  feasible.  A  similar  result  holds 
at  certain  z. 

•  If  a  stronger  version  of  the  algorithm  is  used,  in  which  the  entire  extended 
poll  set  is  evaluated  at  every  step,  then  the  preceding  three  items  apply  to  all 
limit  points  of  extended  poll  points  associated  with  refining  subsequences,  and 
not  just  the  endpoints  of  EXTENDED  POLL  steps.  In  essence,  this  allows  for  a 
stronger  optimality  condition  in  a  special  case. 

In  Chapter  7,  the  FMGPS  algorithm  is  applied  to  the  problem  of  designing  a  load 


bearing  thermal  insulation  system. 
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Chapter  7 

Designing  an  Optimal  Thermal  Insulation  System 

In  this  chapter,  a  new  Matlab®  implementation  of  the  FMGPS  algorithm  is  described, 
and  is  then  applied  to  the  design  of  a  load-bearing  thermal  insulation  system.  The  goal 
of  the  optimization  problem  is  to  minimize  the  power  required  to  properly  operate  the 
system,  subject  to  various  bound,  linear,  and  nonlinear  constraints.  Computational 
results  demonstrate  that  the  algorithm  performs  in  accordance  with  the  theoretical 
results  described  in  Chapter  6. 

7.1  NOMADm  Software 

The  FMGPS  algorithm  has  been  coded  in  Matlab®  function  form  and  incorporated 
into  an  interactive  Matlab®  software  package,  called  NOMADm,  available  for  down¬ 
load  on  the  internet  [1],  The  NOMADm  software  requires  the  user  to  supply  up  to 
five  Matlab®  function  files  as  follows: 

1.  a  function  defining  the  objective  and  nonlinear  constraint  functions  (and  op¬ 
tionally  any  available  derivative  information), 

2.  a  function  that  returns  the  initial  iterate, 

3.  a  function  that  returns  the  bound  vectors  and  coefficient  matrix  that  define  the 
bound  and  linear  constraints  (if  these  exist), 

4.  a  function  defining  the  discrete  set  of  neighbors  (required  only  for  MVP  prob¬ 
lems),  and 

5.  a  function  that  defines  any  parameters  used  by  the  other  user  files  (optional). 
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Additionally,  NOMADm  has  the  following  features: 

•  An  implementation  of  the  algorithm  given  in  Figure  4.3  scheme  for  computing 
tangent  cone  generators  of  a  linearly  constrained  region,  so  that  polling  direc¬ 
tions  always  conform  to  the  geometry  of  the  constraints  (see  Definition  4.3). 

•  A  cache  or  database  of  previously  computed  iterates  so  that,  in  the  case  of  ex¬ 
pensive  function  evaluations,  multiple  evaluations  of  the  same  point  are  avoided. 

•  Several  common  choices  for  the  SEARCH  step,  including  additional  polls  around 
other  filter  points  and  Latin  hypercube  search  (see  [107,  126]).  It  also  includes 
“hooks”  for  user-defined  search  and  surrogate  functions,  so  that  users  can  seam¬ 
lessly  attach  their  own  favorite  method  to  the  code. 

•  Optional  scaling  of  the  mesh  directions  in  a  manner  similar  to  that  of  [41]. 

•  Additional  optional  termination  criteria,  including  limits  on  the  number  of  GPS 
iterations,  number  of  function  calls,  and  CPU  time. 

•  Real-time  plots  of  the  filter  (similar  to  Figure  4.5)  and  iteration  history. 

•  Options  for  using  derivative  information  (see  Chapter  8)  to  reduce  the  number 
of  function  evaluations  required  for  convergence. 

7.2  Problem  Description 

In  a  thermal  insulation  system,  heat  intercepts  are  often  used  to  minimize  the  heat 
flow  from  a  hot  to  a  cold  surface  (or  vice  versa).  Figure  7.1  illustrates  an  example  of 
such  a  system  of  fixed  length  L,  in  which  power  is  applied  to  maintain  each  intercept 
i  at  a  cooling  temperature  Tu  i  =  1, 2, . . . ,  n.  An  insulator  of  thickness  xt  is  placed 
between  each  pair  of  intercepts  i— 1  and  i ,  with  the  convention  that  *  =  0  and  i  =  n+ 1 
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represent  the  cold  and  hot  surfaces,  respectively,  so  that  T0  =  Tc  and  Tn+i  =  TH. 
Note  that  each  insulator  in  Figure  7.1  may  have  a  different  cross-sectional  area.  The 
design  variables  for  the  system  include  the  number  and  cooling  temperatures  of  the 
intercepts,  and  the  insulator  types  and  thicknesses.  Furthermore,  we  assume  that  the 
system  must  be  load-bearing,  meaning  that  the  insulators  act  as  mechanical  supports; 
thus,  only  solid  materials  can  be  used. 


Figure  7.1  Schematic  of  a  Thermal  Insulation  System 

Variations  of  this  type  of  system  used  in  cryogenic  engineering  applications,  such 
as  superconducting  magnetic  energy  storage  systems  and  space  borne  magnets,  have 
been  studied  by  several  authors.  Hilal  and  Boom  [71]  used  a  gradient-based  optimizer 
to  minimize  power  for  n  =  1,  2,  and  3  intercepts  with  two  choices  of  insulators  of 
constant  cross-sectional  area,  but  without  mixing  insulator  types  within  the  system. 
Hilal  and  Eyssa  [72]  studied  the  same  problem,  but  with  variable  cross  sectional 
area  for  the  mechanical  supports.  In  considering  systems  with  more  general  types 
of  insulation,  Chato  and  Khodadidi  [30]  sought  to  minimize  entropy,  similar  to  the 
formulation  given  by  Bejan  [15].  Other  related  attempts  to  optimize  the  design  of 
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these  systems  are  found  in  [99],  [112],  and  [137].  An  actual  implementation  of  these 
types  of  systems  for  the  Large  Hadron  Collider  (LHC)  project  is  discussed  in  three 
technical  reports  [45,  81,  105].  While  all  of  these  studies  vary  in  geometry  and  fidelity 
of  the  underlying  models,  none  of  them  optimizes  with  respect  to  the  categorical 
variables;  namely,  the  number  of  intercepts  and  types  of  insulators. 

Kokkolaras,  Audet,  and  Dennis  [89]  extended  the  Hilal  and  Boom  model  to  in¬ 
clude  the  categorical  variables  in  the  optimization  problem,  allowing  for  a  mixture 
of  different  types  of  insulators  in  the  system.  They  achieved  a  65%  reduction  in  re¬ 
quired  refrigeration  power  from  that  of  [71]  using  the  MGPS  algorithm  described  in 
Chapter  5  to  solve  this  bound  constrained  MVP  problem.  However,  other  than  the 
selecting  the  specific  list  of  possible  insulators,  they  did  not  consider  certain  load- 
bearing  aspects  of  the  problem,  such  as  thermal  expansion,  system  mass,  and  stress. 

7.2.1  Basic  Model  Formulation 

In  this  section,  we  reformulate  the  Kokkolaras  et  al.  model  (as  extended  from  that 
of  [71])  to  include  constraints  on  thermal  expansion,  stress,  and  mass.  Much  of 
this  discussion  comes  directly  from  the  presentation  in  [89],  although  some  of  the 
discussion  of  stress  is  similar  to  that  of  [72].  The  new  model  can  be  expressed  as 

min  f(n,I,x,T ) 

(n,I,x,T)eX  (7.1) 

s.  t.  g(x)  <  0, 

with  the  following  nomenclature: 

•  n  is  the  number  of  heat  intercepts,  with  the  convention  that  the  cold  and  hot 
walls  are  numbered  0  and  n  +  1,  respectively. 


115 


•  I  e  ln+1  is  the  set  of  insulators  used,  where  represent  the  insulator  type 
between  intercepts  i  —  1  and  z,  and  T  denotes  the  finite  set  of  possible  insulator 
types. 

•  x  G  9?"  is  the  vector  whose  z-th  component  is  the  thickness  of  the  z-th  insulator, 

with  the  convention  that  £n+1  =  L  —  x%- 

•  T  e  W1  is  the  vector  whose  z-th  component  is  the  temperature  of  the  z-th 
intercept,  with  the  convention  that  T0  —  Tq  and  Tn+ 1  =  TH. 

•  The  feasible  region  X  is  defined  by  the  following  linear  and  categorical  con¬ 
straints: 


^  ^  {1)2,...,  zirnax{ ,  (7.2) 

/  G  -T+\  (7.3) 

n 

J2Xi-L'  (7.4) 

2—1 

xi  >  0,  z  =  l,...,n  (7.5) 

Ti-i  <Ti<  Ti+1,  z=  l,...,rz.  (7.6) 


A  difficulty  in  solving  this  problem  is  that  the  dimension  of  the  vectors  /,  x,  and 
T  depend  on  the  variable  n.  For  any  value  of  n,  there  are  n  +  1  other  categorical 
variables  and  2 n  continuous  variables,  yielding  a  total  of  3n  +  2  variables. 

7.2.2  Objective  Function 

The  objective  function  represents  the  total  refrigeration  power  of  the  system;  thus, 

n 

/=x> 


(7.7) 
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where  Pt  is  the  power  applied  to  intercept  *,  i  =  1, 2, . . .  ,n.  The  required  power  to 
keep  intercept  i  at  a  fixed  temperature  T,  is  given  by 

(qi+i~qi),  (7-8) 

where  (a  function  of  temperature)  is  the  thermodynamic  cycle  efficiency  coefficient 
at  intercept  i,  and  qt  represents  the  heat  flow  from  intercept  *  to  i  -  1. 

In  general,  heat  flow  q  through  a  volume  is  governed  by  Fourier’s  law, 

qdx  =  Ak(T)dT ,  (7.9) 


where  A  (a  function  of  the  spatial  coordinates  in  the  z-y  plane  perpendicular  to 
the  x  coordinate)  is  the  cross-sectional  area  of  the  volume,  and  k  (a  function  of  the 
temperature  variable  T)  is  the  effective  thermal  conductivity  of  the  volume. 

For  the  problem  at  hand,  the  heat  flow  q{  from  intercept  i  to  i  -  1  is  given  by 

qi  =  —  fTl  k(T ;  h)dT,  i  =  1, 2, . . . ,  n  +  1,  (7.10) 

xi  Jt U 

where  Ai  denotes  the  cross-sectional  area  of  insulator  i,  and  the  thermal  conductivity 
A;  is  a  function  of  both  the  temperature  T  and  the  type  of  insulator  It. 

By  substituting  (7.10)  into  (7.8),  the  objective  function  can  be  expressed  by  (7.7), 


with 

Pi  ~  Ci 


Aj+ 1 

Xi 


k(T ;  h)dT 


_A_ 

Xi- 1 


k{T-Ii)dT 


(7.11) 


7.2.3  Nonlinear  Constraints 


The  inequality  in  (7.1)  represents  the  new  nonlinear  constraints  added  to  the  MVP 
problem  of  [89],  which  allow  us  to  numerically  test  the  FMGPS  algorithm.  Although 
we  add  three  types  of  constraints;  namely  mass,  stress,  and  thermal  expansion,  we 
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model  the  stress  constraint  as  an  implicit  equality  constraint  so  that  we  can  eliminate 
insulator  cross-sectional  areas  as  design  variables. 

The  first  constraint  concerns  the  overall  mass  of  the  system.  This  constraint  may 
actually  be  budgetary  in  nature,  as  larger  amounts  of  insulation  material  increase  the 
overall  cost.  Since  the  weight  of  each  insulator  can  be  expressed  as  a  product  of  the 
density  of  the  insulator  material  and  its  volume,  we  have  the  constraint, 

n 

}  ^  ^  771  maxi  (7.12) 

t=i 

where  Pi(Ii)  represents  the  density  of  material  used  as  insulator  i,  and  mmax  is  the 
maximum  allowable  mass  of  the  system.  Note  that  setting  each  volume  to  A&i 
assumes  that  the  area  of  the  insulator  is  constant  throughout  the  interval;  however, 
this  is  not  an  unreasonable  assumption. 

We  also  assume  that  the  system  must  be  capable  of  bearing  a  specified  load  (or 
force)  F.  The  stress,  defined  as  load  per  unit  area,  must  not  be  allowed  to  exceed 
a  certain  level.  In  this  case,  we  constrain  the  stress  applied  to  each  insulator  to  be 
no  greater  than  the  tensile  yield  strength  of  that  insulator  (thus,  we  assume  that  the 
load  is  suspended  from  the  system,  rather  than  resting  on  top  of  it).  The  tensile  yield 
strength  at  insulator  i ,  denoted  by  <7i(T;  /*),  is  a  function  of  both  the  insulator  type 
and  temperature.  Thus,  we  have  constraints  of  the  form 


A, 


min 


Wi(T,Ii)  :  TVi  <T  <  TJ,  i  =  1,2, ...,n  +  1. 


(7.13) 


The  difficulty  with  the  constraints  given  in  (7.12)  and  (7.13))  is  that  they  treat  the 
areas  Ai,  i  =  1, . . . ,  n,  as  additional  design  variables.  However,  there  is  a  convenient 
and  perfectly  legitimate  way  around  this  problem.  First,  observe  that  the  direct 
relationship  between  At  and  Pt  in  (7.11)  means  that  decreasing  the  cross-sectional 
area  of  any  insulator  reduces  the  power  applied  to  the  corresponding  intercept,  which 


is  exactly  the  goal  of  the  optimization  problem.  Thus,  at  an  optimal  point,  each  A{ 
should  be  as  small  as  possible.  Furthermore,  as  each  At  is  made  smaller,  the  stress 
constraint  in  (7.13)  becomes  binding  -  and  must  be  so  at  optimality.  Therefore,  we 
can  assume  that  (7.13)  holds  with  equality,  solve  for  A4,  and  substitute  the  resulting 
equation  into  (7.12).  We  must  also  make  this  substitution  into  the  objective  function. 
This  eliminates  as  a  design  variable  and  yields  a  stress-mass  constraint  of  the  form 


/n^  ^  ^max 

^  — 

2—1  * 


(7.14) 


The  final  constraint  is  one  associated  with  the  system’s  thermal  expansion,  or 
in  our  cryogenic  application,  thermal  contraction.  In  addition  to  moving  the  heat 
intercepts  out  of  their  optimal  position,  thermal  contraction  causes  additional  stress 
on  the  materials  and,  if  excessive,  can  cause  other  difficulties,  such  as  deformations  in 
the  material.  The  development  of  this  constraint  presented  here  is  adapted  from  [13]. 

Since  different  insulators  at  different  temperatures  exhibit  different  contraction 
behaviors,  we  must  treat  thermal  contraction  of  each  insulator  separately  as  a  change 
in  its  thickness.  The  constraint  can  then  be  expressed  as  a  weighted  sum,  where  each 
insulator’s  weight  is  simply  its  thickness  divided  by  the  total  length  of  the  system; 


i.e., 


Ef  Axj\  / Xj\  < 

v~xT )  \  l)  ~ 

i=  1  v  1  ' 


_5_ 

100’ 


(7.15) 


where  6  is  a  limit  on  the  percent  total  contraction  of  the  system. 

For  insulator  i,  e(T;  U)  denotes  the  unit  thermal  contraction  from  intercept  i  to 
any  point  between  intercept  i  and  i  -  1  (a  function  of  temperature)  and  is  computed 
by  the  formula, 
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where  A  is  the  linear  coefficient  of  thermal  expansion  for  the  insulator  type.  Pre¬ 
computed  values  for  e(T’;/t)  are  available  in  lookup  tables  for  a  wide  range  of  tem¬ 
peratures  and  for  several  material  types  [13,  108].  Total  thermal  contraction  for  an 
insulator  i  is  given  by 


Ax.  /^,e(T;J,)A:(7HKr 

l!  fr‘_,  k(TJi)dT 


(7.17) 


thus,  the  nonlinear  thermal  expansion  constraint  is  given  by 


E 


(7.18) 


The  resulting  optimization  problem  can  now  be  expressed  by  the  objective  function 
in  (7.7)  and  (7.11),  the  linear  constraints  defined  in  (7.2)-(7.6),  and  the  nonlinear 
constraints  defined  by  (7.14)  and  (7.18). 


7.3  Computational  Model 

This  section  further  describes  the  optimization  problem  in  terms  of  modelling  deci¬ 
sions,  material  data  sources,  and  other  problem  setup  issues. 

7.3.1  Material  Data 

The  types  of  insulators  were  chosen  as  the  same  as  in  [89];  namely,  nylon,  teflon,  fiber¬ 
glass  epoxy  (both  normal  and  plane),  6063-T5  aluminum,  1020  low-carbon  steel,  and 
304  stainless  steel.  For  each  of  these  materials,  a  substantial  amount  of  engineering 
data  was  required.  Thermal  conductivity  and  contraction  data  were  obtained  from 
lookup  tables  in  [13]  and  [108],  while  material  densities  were  found  in  [125]  and  [108], 
and  tensile  yield  strength  data  were  obtained  from  [44]  and  [106]. 
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Thermodynamic  cycle  efficiency  coefficients  Q,  i  =  1, 2, . . .  ,n  (see  (7.8))  are  de¬ 
pendent  on  temperature  as  follows: 


a  =  { 


V 


5, 

4, 

2.5, 


if  T  <  4.2 K 

if  4.2 K  <T<  71 K 

if  T  >  71 K. 


(7.19) 


In  order  to  make  the  model  as  accurate  and  efficient  as  possible,  cubic  splines 
were  used  to  fit  all  of  the  data  found  in  lookup  tables,  including  thermal  conduc¬ 
tivity,  thermal  contraction,  and  tensile  strength  data.  Numerical  integrations  were 
performed  by  applying  a  composite  Simpson’s  Rule,  with  nodes  matching  those  of 
the  cubic  spline.  This  eliminates  truncation  error,  since  Simpson’s  Rule  is  exact  for 
cubic  polynomials  [85]. 


7.3.2  Choosing  Discrete  Neighbors 

Recall  from  the  discussion  in  Section  5.1,  and  especially  from  Definition  5.3,  that  the 
neighborhood  structure  that  the  user  chooses  to  incorporate  determines  the  definition 
of  a  minimizer.  That  is,  when  the  solution  of  an  MVP  problem  is  found,  it  is  with 
respect  to  the  user-specified  discrete  set  of  neighbors.  If  a  neighborhood  structure 
is  chosen  so  that  all  other  sets  of  discrete  variable  values  are  neighbors,  a  more 
global  solution  can  be  obtained,  but  at  extraordinary  computational  cost.  On  the 
other  hand,  severely  restricting  the  size  of  the  set  of  neighbors  will  save  significant 
computational  cost,  but  a  local  optimizer  with  a  higher  objective  function  value  is 
likely  to  be  obtained. 

In  order  to  make  proper  comparisons,  the  set  of  neighbors  we  use  for  this  problem 
is  exactly  the  same  as  was  used  by  Kokkolaras  et  al.  [89].  It  includes  designs  in  which 
the  following  occur: 
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•  The  type  of  insulator  between  any  two  heat  intercepts  is  changed  to  any  other 
type,  while  insulator  thicknesses  and  intercept  temperatures  remain  constant. 

•  An  intercept  and  the  insulator  above  it  are  removed,  while  thicknesses  of  the  re¬ 
maining  insulators  are  increased  proportionally  (rounded  to  the  nearest  integer 
multiple  of  the  current  mesh  size)  to  fill  the  remaining  space. 

•  A  new  intercept  and  an  insulator  underneath  it  are  added  with  the  following 
properties: 

-  The  type  of  insulator  is  the  same  as  the  one  below  it, 

-  The  cooling  temperature  is  set  to  the  average  of  the  two  intercepts  adjacent 
to  it,  rounded  to  the  nearest  integer  multiple  of  the  current  mesh  size, 

—  The  thickness  of  both  the  new  insulator  and  the  insulator  below  it  are  both 
set  to  half  of  that  of  latter,  rounded  to  the  nearest  integer  multiple  of  the 
current  mesh  size. 

Note  that  rounding  to  the  nearest  integer  multiple  of  the  current  mesh  size  is 
necessary  to  ensure  that  the  trial  point  lies  on  the  mesh. 

7.4  Computational  Results 

We  now  present  results  of  several  NOMADm  runs  and  show  that  the  approach  signif¬ 
icantly  improves  the  design  of  Hilal  and  Eyssa  [72],  and  is  comparable  to  the  results 
of  Kokkolaras  et  al.  [89],  even  though  the  problem  takes  on  the  additional  nonlinear 
load-bearing  constraints. 

Table  7.1  shows  the  data  parameters  chosen  for  the  results  that  follow.  The  first 
four  have  values  identical  to  the  choices  of  Kokkolaras  et  al.  [89],  while  the  remaining 
parameters  are  unique  to  this  problem. 
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Table  7.1  Thermal  Insulation  Problem:  Model  Parameters 


Parameter 

Symbol 

Value 

Hot  surface  temperature 

Th 

300  K 

Cold  surface  temperature 

Tc 

4.2  K 

System  total  length 

L 

100  cm 

Maximum  number  of  intercepts 

^max 

10 

Load  placed  on  the  system 

F 

250  kN 

Maximum  total  system  mass 

WAnax 

10  kg 

Maximum  system  thermal  contraction 

5 

5% 

To  match  the  setup  of  [89]  as  much  as  possible,  runs  were  performed  with  an 
initial  mesh  size  of  A0  =  10  and  terminated  when  the  condition  Ak  <  .15625  was 
achieved.  Also,  mesh  coarsening  was  not  used.  The  mesh  refinement  strategy  used 
by  [89],  could  not  be  duplicated  by  NOMADm;  thus,  in  our  runs,  we  refine  by  simply 
divide  the  mesh  size  parameter  Ak  in  half.  Extended  poll  triggers  for  the  objective 
and  constraint  violation  function  were  set  at  one  and  five  percent,  respectively,  the 
former  being  consistent  with  [89]. 

Also  to  match  the  setup  of  [89],  we  set  the  initial  design  to  consist  of  1  intercept 
placed  exactly  in  the  middle  of  the  system  and  set  at  150  K,  with  a  nylon  insulator 
on  the  cold  side  and  a  teflon  insulator  on  the  hot  side. 

When  the  filter  logic  of  the  FMGPS  is  applied  for  the  nonlinear  constraints,  the 
SEARCH  step  consists  of  a  poll  around  the  least  infeasible  point,  while  polling  is 
performed  around  the  best  feasible  point. 

7.4.1  Validation 

The  software  and  function  files  were  validated  by  mimicking  the  designs  of  [71]  and 
[72],  running  these  problems,  and  comparing  the  designs.  In  both  of  these  previous 
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papers,  the  authors  applied  their  optimizer  to  cases  of  1-3  heat  intercepts  with  insu¬ 
lators  made  of  either  304  stainless  steel  or  plane-cloth  fiberglass  epoxy.  For  stainless 
steel,  our  results  matched  theirs  almost  exactly.  For  fiberglass  epoxy,  there  were  some 
slight  differences  when  more  than  one  intercept  is  used,  but  these  were  noted  in  [89] 
as  well.  These  differences  were  most  likely  caused  by  different  methods  in  comput¬ 
ing  the  objective  function  integrals.  As  in  [89],  we  used  cubic  splines  to  fit  thermal 
conductivity  data  that  was  available  only  in  tabular  form  for  specific  temperatures. 
However,  rather  than  apply  a  Matlab®  integration  routine,  we  applied  our  own  im¬ 
plementation  of  Simpson’s  rule,  which  is  exact  for  cubic  polynomials.  The  inaccuracy 
is  more  visible  in  the  epoxy  results  because  thermal  conductivity  data  was  only  avail¬ 
able  at  four  temperatures,  as  opposed  to  the  18  different  temperatures  available  for 
stainless  steel. 

When  we  recomputed  the  results  of  [89]  to  validate  our  mixed  variable  logic,  we 
converged  to  a  different  design  with  a  similar  low  objective  function  value.  Table  7.2 
shows  the  differences  between  the  two  runs,  where  the  materials  cited  there  are  ab¬ 
breviated  by  the  following:  N  =  nylon,  E  =  epoxy  (normal),  Ep  =  epoxy  (plane), 
and  T  =  teflon.  Note  that,  although  power  is  optimized  in  our  software,  we  report 
normalized  power  at  termination,  in  which  power  is  multiplied  by  the  system  length 
L  and  divided  by  the  smallest  cross-sectional  area  of  any  insulator.  Previous  authors 
have  expressed  results  this  way  so  that  designs  can  be  compared,  independent  of  these 
two  parameters.  We  keep  this  convention  for  the  same  purpose. 

While  the  numerical  integration  issue  just  described  can  lead  to  small  deviations, 
we  suspect  that  the  difference  in  mesh  refinement  strategies  led  to  a  different  local 
optimizer  along  a  different  path.  In  [89],  with  a  starting  mesh  size  of  A0  =  10,  the 
mesh  refinement  strategy  was  to  divide  the  current  mesh  size  by  2e,  where  i—  1,2,... 
is  incremented  each  time  the  mesh  is  refined.  Since  NOMADm  is  currently  incapable 
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of  incrementing  £,  our  mesh  refinement  consists  of  simply  dividing  the  current  mesh 
size  by  two  (ie.,  £  =  1). 


Table  7.2  Thermal  Insulation  Problem:  MVP  Validation 


Problem: 

Original 

FMGPS  ReRun 

Power  (y]p): 

25.294  W/cm 

25.589  W/cm 

Insulators  Used: 

NNNNNNNEEET 

NNNNNNTEETT 

i 

Xi  (cm) 

Ti  (K) 

Xi  (cm) 

T,(  K) 

i 

0.3125 

4.2188 

4.5313 

6.125 

2 

5.4688 

7.3438 

6.7188 

10.55 

3 

3.9062 

10 

4.8437 

14.35 

4 

6.5625 

15 

4.2188 

17.994 

5 

5.7812 

20 

7.3438 

24.969 

6 

5.1562 

25 

9.8438 

36.006 

7 

13.2812 

40 

24.948 

71.094 

8 

21.4062 

71.0938 

12.135 

116.88 

9 

8.5938 

101.25 

7.5 

156.88 

10 

9.2188 

146.25 

6.4063 

198.44 

11 

20.3125 

11.5105 

7.4.2  Adding  the  Nonlinear  Constraints 

The  nonlinear  constraints  were  added  to  the  runs  in  three  steps.  First,  we  simply 
tested  the  two  designs  from  Table  7.2  versus  the  new  nonlinear  constraints  and  found 
that  the  thermal  contraction  constraint  for  both  designs  was  violated  by  approxi¬ 
mately  8%.  This  suggests  that  a  new  design  having  a  different  material  configuration 
should  be  expected  as  the  new  constraints  are  incorporated.  Second,  the  implicit 
constraint  on  stress  (given  by  (7.13)  with  equality),  was  added  to  allow  variable 
cross-sectional  areas,  and  thus  match  the  formulation  of  Hilal  and  Eyssa  [72].  We  re- 
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fer  to  this  as  the  partial  model.  Finally,  the  mass  and  thermal  contraction  constraints 
were  added  to  complete  the  full  model.  By  doing  so,  the  resulting  change  in  required 
power  represents  the  cost  of  satisfying  the  additional  load-bearing  constraints. 

Table  7.3  shows  the  results  for  the  partial  and  full  models  (columns  3  and  4), 
along  with  the  design  found  by  Hilal  and  Eyssa  [72]  (column  2).  For  each  run,  the 
minimal  required  power  is  normalized  by  multiplying  by  the  total  system  length  and 
dividing  by  the  smallest  cross-sectional  area  of  an  insulator.  Following  the  power  and 
the  material  configuration  for  each  design,  insulator  thicknesses  and  heat  intercept 
temperature  settings  are  listed. 


Table  7.3  Thermal  Insulation  Problem:  Results 


Problem: 

Hilal  &  Eyssa 

Partial  Model 

Full  Model 

Power  (qq1): 

53.2  W/cm 

24.551  W/cm 

23.768  W/cm 

Insulators  Used: 

EpEpEp 

EEEEEEEEEEEE 

EEEEEEEEEEEE 

i 

Xi  (cm) 

Ti  (K) 

Xi  (cm) 

T,  (K) 

Xi  (cm) 

Ti  (K) 

i 

22.0 

8.38 

7.1875 

6.5875 

0.625 

4.25 

2 

23.8 

36.3 

11.406 

12.938 

8.125 

7.7375 

3 

24.8 

116.6 

15.625 

25.85 

7.9688 

12.369 

4 

29.4 

29.531 

71.094 

7.8125 

18.094 

5 

6.875 

100.31 

12.344 

29.912 

6 

5 

127.66 

26.094 

71.094 

7 

2.5 

143.13 

8.125 

105.94 

8 

2.5 

159.06 

5.3125 

135.47 

9 

4.6875 

188.59 

5 

165.94 

10 

5 

222.5 

5.625 

202.03 

11 

9.688 

12.9682 

We  can  see  immediately  that  the  addition  of  the  implicit  stress  constraint  results  in 
a  variable  cross-sectional  area  design  that  requires  over  50%  less  (normalized)  power 
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than  the  design  found  by  Hilal  and  Eyssa  [72].  This  savings  was  expected  because  the 
newer  formulation  allows  for  varying  the  number  of  heat  intercepts  and  the  mixing 
of  insulator  types.  A  similar  (65%)  savings  was  achieved  by  Kokkolaras  et  al.  [89]  in 
optimizing  the  constant  cross-sectional  area  formulation  of  Hilal  and  Boom  [71]. 

A  bit  surprising  was  the  slight  decrease  in  power  when  the  final  two  constraints 
are  added  (see  columns  3  and  4),  particularly  because  this  is  the  point  at  which  a 
linearly  constrained  problem  becomes  a  nonlinearly  constrained  one,  and  the  new 
algorithm’s  filter  logic  is  applied.  Recall  that  the  theory  ensures  convergence  to  a 
first-order  stationary  point  for  the  partial  model  (since  all  its  constraints  are  linear), 
but  does  not  do  so  for  the  full  model.  In  spite  of  this,  the  new  algorithm  still  found 
a  better  feasible  design.  It  is  indeed  possible  that  the  FMGPS  algorithm  generated  a 
different  sequence  and  simply  terminated  near  a  better  local  minimizer. 

Figure  7.2  illustrates  the  performance  of  the  FMGPS  algorithm  on  the  full  model, 
where  the  power  required  for  the  incumbent  best  design  is  plotted  versus  the  number 
of  function  evaluations.  The  lower  plot  is  a  magnification  of  upper  one.  The  “L”- 
shaped  plot  is  very  typical  behavior  of  derivative-free  methods,  since  good  stopping 
rules  for  these  methods  are  difficult.  The  “stair  steps”  seen  in  the  right-hand  plot 
indicate  varying  length  polling  sequences. 

We  should  note  that  the  power  values  shown  on  the  vertical  axes  of  these  plots  do 
not  match  the  data  in  the  Table  7.3  because  they  represent  two  different  things.  The 
objective  function  is  to  minimize  power,  as  measured  in  Figure  7.2,  but  the  required 
power  shown  in  Table  7.3  is  normalized  (hence  the  (^)  notation),  so  as  to  allow 
comparisons  with  the  results  of  Hilal  and  Eyssa  [72] . 

Figure  7.3  depicts  the  progression  of  the  filter  during  the  run  of  the  full  model, 
where  the  plots  in  the  right  column  are  magnifications  of  those  on  the  left.  Each 
of  the  three  rows  represents  a  “snapshot”  taken  after  150,  200,  and  500  respective 
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x  io4  FMGPS  Performance 
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Figure  7.2  Iteration  History  for  the 
Thermal  Insulation  System  Design  Problem 

function  evaluations  were  performed.  Although  the  algorithm  terminated  after  more 
than  9000  function  evaluations,  changes  in  the  filter  after  500  function  evaluations 
could  not  be  detected  within  the  resolution  of  the  plot.  This  is  consistent  with  the 
long  and  shallow  progression  of  the  best  objective  function  value  seen  in  Figure  7.2. 

In  the  filter  plots,  the  asterisks  represent  a  subset  of  the  best  feasible  points  found 
up  to  that  point,  while  the  filter  is  represented  by  the  “stair  step”  lines.  In  this  run, 
the  constraints  were  scaled  by  dividing  each  constraint  by  its  right-hand  side  and 
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then  subtracting  one  from  both  sides.  Thus  in  the  left  column  plots,  the  choice  of 
hmax  =  1  represents  a  100%  constraint  violation. 
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Figure  7.3  Filter  Progression  for  the  Full  Model 
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Table  7.3  also  shows  that  the  new  constraints  yield  significantly  different  insulator 
configurations  than  that  of  Kokkolaras  et  al.  [89]  -  that  of  all  normal  cloth  fiberglass 
epoxy.  This  is  consistent  with  the  raw  data  we  used,  which  shows  epoxy  to  have  low 
thermal  conductivity  and  higher  resistance  to  stress  than  nylon  or  teflon. 

However,  some  of  the  other  materials  (including  nylon  and  teflon)  have  better  ther¬ 
mal  contraction  properties  than  epoxy,  which  also  has  a  low  thermal  stress  threshold 
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and  which  would  be  tested  by  any  thermal  contraction.  Modelling  thermal  stress 
as  an  additional  constraint  that  depends  on  thermal  contraction  would  be  an  inter¬ 
esting  extension  to  this  problem  and  might  result  in  a  completely  different  material 
configuration. 

7.5  Conclusions 

The  results  shown  here  demonstrate  that,  although  the  FMGPS  algorithm  can  be 
expensive  -  particularly  for  mixed  variable  problems,  it  successfully  generated  much- 
improved  designs  for  this  problem.  The  design  of  [72]  has  been  significantly  improved, 
and  the  addition  of  constraints  on  stress,  mass,  and  thermal  contraction  yields  a  more 
realistic  feasible  design  with  essentially  no  additional  power  required  over  that  of  [89]. 

One  other  extension  of  this  problem  that  would  be  interesting  to  test  in  future  work 
is  to  allow  the  cross-sectional  area  of  each  insulator  to  vary  continuously  within  its 
interval.  This  requires  some  refinement  of  the  mathematical  model  and  the  resulting 
computer  code,  but  it  is  certainly  achievable.  We  would  also  like  to  perform  more  runs 
that  allow  for  more  heat  intercepts,  so  as  to  compare  further  results  with  [89],  but 
NOMADm  is  not  currently  capable  of  performing  the  requisite  number  of  function 
evaluations  in  a  reasonable  amount  of  time. 
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Chapter  8 

GPS  Algorithms  with  Derivative  Information 

8.1  Motivation 

In  this  chapter,  a  key  question  is  answered;  namely,  how  to  incorporate  any  available 
derivative  information  into  GPS  algorithms  in  such  a  way  that  can  speed  convergence, 
yet  not  sacrifice  the  enhanced  global  performance  that  can  be  implemented  into  these 
methods.  The  new  material  presented  here  is  in  collaboration  with  Audet  and  Dennis, 
and  except  for  one  example,  is  also  found  in  [2]. 

When  derivatives  are  available,  one  might  ask  why  a  direct  search  method  would 
be  chosen  over  the  faster  gradient-based  or  Newton-based  methods.  The  primary 
reason  is  that  gradient- based  and  Newton-based  methods  tend  to  converge  quickly 
to  local  minima.  The  user  may  be  willing  to  sacrifice  the  cost  of  additional  func¬ 
tion  evaluations  that  direct  search  methods  require  in  the  hope  of  obtaining  a  better 
local  minimum.  Also,  as  will  be  shown,  the  algorithm  shown  here  can  make  use  of 
derivatives  when  some,  but  not  all,  partial  derivatives  are  known,  or  when  certain 
directional  derivatives  are  known.  This  is  of  great  benefit  in  some  engineering  ap¬ 
plications  where  computing  some  directional  derivatives  is  expensive  and  computing 
others  are  not.  In  this  case,  use  of  the  inexpensive  derivatives  can  speed  convergence 
without  sacrificing  much  in  computational  cost. 

For  the  sake  of  simplicity,  the  ideas  in  this  chapter  are  applied  to  the  basic  GPS 
algorithm,  discussed  in  Section  4.1,  that  is  used  to  solve  the  optimization  problem 
given  in  (4.1).  The  ideas  are  easily  extendable  to  bound  and  linear  constrained 
MVP  problems;  however,  applying  them  to  NLP  or  MVP  problems  with  nonlinear 
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constraints  is  more  difficult,  and  remains  an  open  area  of  research.  The  current 
implementation  used  in  the  NOMADrn  software  (see  Chapter  7)  is  a  variant  of  the 
feasible  directions  method  in  which  directions  are  constructed  so  that  the  resulting 
trial  points  remain  on  the  GPS  mesh. 

8.2  The  Basic  GPS  Algorithm  With  Derivative  Information 

The  main  idea  of  this  chapter  is  that  whenever  the  gradient  of  the  objective  function 
V/(xfe)  at  the  current  iterate  Xk  is  available,  it  can  be  used  to  prune  (or  eliminate 
from  consideration)  the  ascent  directions  from  the  poll  set.  The  pruned  set  of  polling 
directions,  denoted  Dk  C  Dfc,  is  defined  as  follows. 

Definition  8.1  Given  a  positive  spanning  set  Dk,  the  pruned  set  of 
polling  directions ,  denoted  by  Dvk  C  Dk,  is  given  by 

Dl  =  {d  e  1\  :  dTVf(xk)  <  0} 

when  the  gradient  V f(xk)  is  known,  or  by 

Dpk  =  Dk\{deVk  :f(xk-,d)>  0}, 

when  incomplete  derivative  information  is  known,  where  Vk  C  Dk  is  the 
set  of  directions  in  which  the  directional  derivative  is  known.  In  the  case 
where  the  gradient  is  known,  -Vf(xk)  is  said  to  prune  Dk  to  Dk. 

Note  that  the  pruning  operation  depends  only  on  xk,  Dk,  and  the  availability  of 
the  gradient,  and  is  independent  of  Afc.  We  can  now  define  the  pruned  poll  set  at 
iteration  k  as 


pk  =  {xk  + Akd  :  d  e  Dpk}. 


(8.1) 
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The  poll  step  of  the  algorithm  evaluates  the  constraints  and  objective  at  points  in 
until  an  improvement  is  found.  If  an  improvement  is  not  found,  then  the  incumbent 
solution  xk  is  said  to  be  a  mesh  local  optimizer  with  respect  to  the  pruned  poll  set. 
Notice  that  if  all  the  points  in  the  pruned  poll  set  lie  outside  X,  then  /  would  not  be 
evaluated  at  all  in  order  to  conclude  that  xk  is  a  mesh  local  optimizer.  We  have  seen 
large  savings  from  this  fact  in  preliminary  tests. 

The  GPS  algorithm  that  uses  derivative  information  is  presented  in  Figure  8.1. 

Generalized  Pattern  Search  with  Derivative  Information 

Initialization:  Let  x0  be  such  that  f(x 0)  is  finite,  and  let  M0  be  the  mesh  defined 
by  A0  >  0  and  z0  (see  (4.2)). 

For  k  =  1, 2, ... ,  perform  the  following: 

1.  search  step:  Employ  some  finite  strategy  seeking  an  improved  mesh  point; 
i.e.,  xk+i  e  Mk  such  that  f(xk+ 1)  <  f(xk). 

2.  POLL  step:  If  the  SEARCH  step  fails  to  find  an  improved  mesh  point,  evaluate 
/  at  points  in  the  pruned  poll  set  P%  until  an  improved  mesh  point  xk+i  is 
found  (or  until  done). 

3.  Update:  If  SEARCH  or  POLL  finds  an  improved  mesh  point,  set  Afc+1  ac¬ 
cording  to  (4.5);  otherwise,  set  xk+\  =  xk  and  set  Afc+1,  according  to  (4.6). 


Figure  8.1  GPS  Algorithm  with  Derivative  Information 

Clearly,  if  the  gradient  is  available  and  is  zero,  then  the  algorithm  can  be  stopped 
with  a  solution  satisfying  first  order  optimality  conditions  (a  similar  stopping  criterion 
could  be  devised  when  the  norm  of  the  gradient  is  small  enough). 

If  the  gradient  is  available  and  is  nonzero,  Theorem  3.3  (also  see  [38])  guarantees 
that  the  negative  gradient  must  prune  at  least  one  column  of  Dk,  since  there  must 
be  at  least  one  column  for  which  dTVf(xk)  >  0.  Moreover,  (by  the  same  theorem)  it 
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could  prune  as  many  as  \Dk\  -  1  directions  ( i.e .,  all  but  one),  leading  to  considerable 
savings  for  an  expensive  function.  In  section  8.3,  we  show  how  to  systematically 
construct  such  a  set  by  a  special  choice  of  D  and  Dk. 

Of  course,  we  expect  that  our  pruning  operation  will  yield  more  iterates  with  poll 
steps  that  fail  to  find  an  improved  mesh  point,  but  this  should  happen  less  frequently 
as  the  mesh  size  parameter  gets  smaller.  Furthermore,  the  role  of  the  POLL  step  is 
mainly  to  ensure  that  the  current  iterate  is  a  mesh  local  optimizer.  This  means  that 
the  string  of  successful  searches  on  the  current  mesh  is  halted  and  searching  can  start 
afresh  on  a  finer  mesh.  But  for  a  sufficiently  small  mesh  parameter,  if  our  pruned 
POLL  fails,  then  the  “unpruned”  POLL  would  also  fail.  This  suggests  that  derivatives 
might  be  more  useful  when  the  mesh  parameter  is  small  and  the  POLL  step  would  not 
be  expected  to  get  out  of  the  current  basin.  Of  course,  the  search  can  always  move 
us  into  a  deeper  basin. 

8.3  Using  Derivative  Information  to  Prune  Well 

Each  iteration  of  a  GPS  algorithm  is  dependent  upon  the  choice  of  a  positive  spanning 
set  Dk,  selected  from  the  columns  of  the  larger  finite  positive  spanning  set  D.  Here 
it  is  useful  to  think  of  D  as  preselected,  but  in  fact,  it  can  be  modified  finitely  many 
times  at  the  discretion  of  the  user. 

In  this  section,  we  first  consider  a  simple  measure  of  the  richness  of  the  set  D. 
Theorem  3.3  shows  that  D  is  a  positive  spanning  set  for  5ft"  if  and  only  if,  for  every 
v  e  $R",  there  exists  d  e  D  such  that  vrd  >  0.  This  says  that  a  positive  spanning 
set  has  at  least  one  member  in  every  half  space.  We  are  interested  in  D  so  rich 
in  directions  that,  for  every  half  space,  we  can  select  a  subset  of  vectors  in  D  that 
positively  spans  5?n,  only  one  of  which  lies  in  that  half  space.  We  will  show  that 
no  positive  basis  D  can  have  it,  and  wc  will  prove  that  ID)  =  {-1,0,1}"  has  this 
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property.  We  build  up  to  this  point  by  considering  some  useful  finer  measures  of 
directional  richness  in  D.  An  important  observation  is  that  when  gradients  are  not 
available,  positive  bases  are  useful  in  GPS  because  they  give  small  poll  sets  [93],  but 
when  derivative  information  is  available,  positive  bases  may  be  a  poor  choice  because 
a  richer  choice  of  directions  makes  it  easier  to  isolate  a  descent  direction  and  prune 
to  a  smaller  poll  set. 

The  following  definition  is  consistent  with  the  earlier  Definition  8.1,  although  it  is 
more  general  in  keeping  with  the  applications  to  follow. 

Definition  8.2  (i)  Given  a  positive  spanning  set  D  C  and  a  nonzero 

vector  v  e  3?n,  let  D{v)  C  D  be  a  positive  spanning  set  that  minimizes 
the  cardinality  of  the  pruned  set  Dp(v)  —  {d  €  D(v)  :  dTv  >  0}  (that 
cardinality  is  denoted  p(D,v)).  Then  the  vector  v  is  said  to  prune  D  to 
p(D,v)  vectors. 

(ii)  Let  p(D)  =  maxu€S}n\{0}  p{D,v).  Then  the  positive  spanning  set  D  is 
said  to  prune  to  p{D)  vectors. 

First  we  collect  some  results  on  p(D,  v ).  From  the  discussion  following  Definition  8.1, 
the  reader  will  see  that  p(D, -Vf(xk  j)  is  the  minimal  number  of  polling  directions 
that  can  be  found  by  pruning  choices  of  Dk  =  D(—Vf(xk))  C  D.  Consequently,  the 
cardinality  of  the  pruned  poll  set  P%  constructed  from  D\  =  Dp(-W  f(xk))  C  Dk  C  D 
is  minimal.  We  will  say  that  is  a  minimal  poll  set. 

These  results  imply  that  the  richer  the  directional  choice  in  a  positive  spanning  set 
D,  the  more  sagacity  can  be  employed  to  ensure  that  Dk  can  be  extensively  pruned. 
Of  course,  if  D  is  not  only  a  positive  spanning  set,  but  also  a  positive  basis,  then  one 
would  expect  to  allow  less  pruning  since  D(v)  =  D  for  every  choice  of  v.  Indeed,  it 
is  not  surprising  for  a  positive  basis  D  that  p(D)  grows  with  the  dimension  n. 
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Proposition  8.3  If  D  is  a  positive  spanning  set,  then  1  <  p(D,v )  < 
p{D)  <  |Z)|  —  1  for  all  nonzero  v  £  3?n.  Moreover,  if  D  is  a  positive  basis, 
then  2±!  <  ^  <  p(D)  <  \D\  <  2 n. 

Proof.  The  first  assertion  is  direct,  because  for  any  v  ^0  and  any  choice  of  positive 
spanning  set,  Theorem  3.3  guarantees  that  there  must  be  at  least  one  member  of  the 
set  that  makes  a  positive  inner  product  with  v,  and  another  that  makes  a  negative 
one.  Thus,  every  possible  Dp(v)  contains  at  least  one  element  and  at  most  \D\  -  1. 

This  argument  also  guarantees  that  p(D)  <  \D\.  To  derive  a  lower  bound  on 
p{D),  we  simply  consider  v  and  -v.  Since  no  proper  subset  of  a  positive  basis 
is  a  positive  spanning  set,  the  only  possible  choice  for  D(v)  and  for  D(-v )  is  D. 
Thus,  Dp(v)  U  Dp(-v)  =  D,  and  so  2 p(D)  >  p(D,v)  +  p(D,  -v)  >  \D\.  The  other 
inequalities  follow  from  [38],  where  it  is  shown  that  2 n>  \D\  >n  +  1  for  any  positive 
basis.  ■ 

Finally,  we  use  the  notation  and  discussion  above  to  state  the  following. 

Proposition  8.4  If  Vf(xk)  is  available  at  iteration  k  of  the  GPS  al¬ 
gorithm,  then  the  poll  set  at  xk  has  minimal  cardinality  p(D,  -  V/(xfc)), 
which  is  an  upper  bound  on  the  number  of  function  evaluations  required 
to  execute  the  POLL  step  using  a  minimal  poll  set. 

Proof.  The  proof  follows  directly  from  Definition  8.2  and  from  the  discussion  fol¬ 
lowing  it.  ■ 

In  the  next  section,  we  propose  a  set  of  directions  rich  enough  so  that  when  the 
gradient  is  available  the  upper  bound  on  the  number  of  function  evaluations  in  the 
POLL  step  (see  Proposition  8.4)  is  one,  thus  Pp  contains  a  single  element. 
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8.3.1  Pruning  with  the  Gradient 

We  show  here  that  the  set  ID)  =  {-1, 0, 1}"  prunes  to  a  singleton,  i.e.,  for  any  non-zero 
vector  !jeRn,  there  exists  a  positive  spanning  set  O(u)  C  D  such  that  the  subset  of 
vectors  of  D(u)  that  makes  a  nonnegative  inner  product  with  v  consists  of  a  single 
element.  The  key  lies  in  the  construction  of  O(u),  which  is  done  by  taking  the  union 
of  the  ascent  directions  of  D  with  an  element  d  of  D  that  satisfies  the  properties, 


if  Vi  =  0, 

then 

di  — 

0, 

(8.2) 

if  Vi  yf  0, 

then 

d%  7^ 

— sign(ui), 

(8.3) 

=  IMloo, 

then 

di  — 

sign(uj), 

(8.4) 

for  every  i  £  N,  with  the  convention  that  sign(0)  =  0.  Our  construction  will  be  such 
that  d  will  be  the  only  unpruned  member  of  D(u).  The  following  technical  lemma  is 
necessary  for  the  proof  of  Proposition  8.6. 


Lemma  8.5  If  6  >  1  then 

y/S^l  >  25-y/W+l' 

Proof.  If  5  >  1,  then  the  above  denominators  are  non-negative  and  the  following 
inequalities  are  equivalent  to  the  one  above: 


(< 5  -  VP^T){28  -  V<52  +  1) 
2 52  -  dVP  +  1 
25 
2  52 


>  {VtP  +  l  -  5){VP^l) 

>  V5^^(V5^Tl-5  +  25-V5^£fj 

>  \/52  +  1  +  V52-l 

>  2V5A  -  1. 


This  last  inequality  obviously  holds  for  any  5  >  1. 
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We  first  show  the  existence  of  vectors  in  D  satisfying  properties  (8.2)-(8.4).  This 
means  that  there  are  various  choices  for  B>(u).  Three  different  choices  for  the  single 
unpruned  direction  are  given,  and  we  will  discuss  their  significance  below. 


Proposition  8.6  Let  H3>  =  {  —  1,0, 1}".  For  any  nonzero  v  G  9?n,  prop¬ 
erties  (8.2)-(8.4)  are  satisfied  by  each  vector  d^\S2\  and  S°°\  where  for 
each  i  €  N, 


sign(Kj)  if  |t-,|  =  |HU 
0  otherwise, 
vTd 

G  arg  max  — r, 
d€D\{0}  ||d||2 

=  sign(uj). 


(8.5) 

(8.6) 
(8.7) 


Proof.  Let  v  be  a  nonzero  vector  in  5Rn.  The  proof  is  trivial  for  d(1)  and  for  d(oo). 
The  remainder  of  the  proof  is  for  d(2h  Observe  that  since  d(oo)  G  B\  {0},  the  optimal 
value  of  the  maximization  operator  in  (8.6)  is  strictly  positive. 

To  verify  (8.3),  suppose  that  d\ ^  —  — sign(uj)  for  some  index  i  G  N  with  Vi  ^ 
0.  Define  de  =  df^  for  £  e  N  \  {i}  and  dj  =  sign(uj).  The  optimality  of  is 
contradicted: 

vTd(»  1 


IM(2)ll2  P2>lb 


Y  V(d< 


(2) 

t 


\Vi  < 


eeN\{i} 


IM(2)lb 


E  *&  + 


Vi  - 


eeN\{i} 


vTd 

i® 


To  verify  (8.2),  suppose  that  vt  =  0  ^  df]  for  some  index  i  G  N.  Define  dt  =  dt 
for  i  G  N  \  {*}  and  d*  =  0.  Note  that  (8.3)  guarantees  that  d  G  D  \  {0},  thus 


IMI2  =  ||^(2)||2  -  1  >  0.  The  optimality  of  d(2)  is  contradicted: 


0  < 


vTd^ 


m  IM(2)lh 


Y  vd 


(2) 


< 


?ev\{t} 


v^iji 


=  E  = 

ft ^  nr\  f  ^ 


vTd 

IS 


138 


To  verify  (8.4),  we  must  first  show  that  two  other  properties  hold;  namely,  for  any 

i,j  €  N, 

(a)  if  kl  >  kl.  then  \df]\  >  \ df\. 

(b)  If  \vj |  =  kl>  then  \df]\  =  \df]\. 

To  prove  (a),  let  kl  >  k |  for  some  i,j  G  N.  Suppose  that  \d\  ^ \  <  \ dfj  ^ | ;  i.e., 
df]  =  0,  df  =  sign(uj)  +  0.  Define  dt  =  df  for  i  G  N  \  {i,j}  and  d{  =  sign^)  and 
dj  —  0.  Then  we  have  ||d||2  =  ||d^||2  and 


vTd  =  E  ^42)  +  h  >  E  ^42)  +  M  =  "2), 

eeN\{i,j}  etN\{i,j} 

which  contradicts  the  definition  of  d(2). 

To  prove  (b),  let  kl  =  kl  for  some  i,j  G  N.  If  v{  =  vj  =  0,  then  df]  =  df]  =  0. 
Now  consider  the  case  where  both  and  Vj  are  nonzero,  and  suppose  that  \di  |  ^ 
|dj2)|;  without  any  loss  of  generality,  suppose  that  d-2)  =  0 ,df^  =  sign(uj)  ^  0.  Let 
5  =  |k2)||2.  Define  de  =  df]  ior  £  e  N  \  {z}  and  d{  =  signfa).  If  6  =  1,  then  a 
contradiction  follows: 


vTdW 


kl  <  V2\ vj\  = 


kl  +  k 


vTd 


||rf(2)||2  -  -  V^'  \\d\\2 

If  8  >  1,  then  set  a  =  J2eeN\{i,j}  vidT  >  Since  d{2)  is  an  °Ptimal  solution  of  (8.6), 


then 


< 


vTd& 


<?  +  I  Vj 


i  "  IM<2>||2  « 

The  previous  lemma  leads  to  a  contradiction: 


kl  >  s-y/p^l 


Wn 


y/¥Tl-5 
a  "  25  -  VPTT 


> 


a{VP~+l-5)  <  \vj\i25-  V52  +  l) 


vTdW 


a+\vj\  a  +  k|  +  kl 


vTd 


\\d^h  d  y/P  +  1  \\d\\2 

Now,  with  (a)  and  (b)  proven,  let  i  G  N  satisfy  kl  =  Hulk.  By  (a)  and  (b),  we 
have  |d-2)|  >  \df}\  for  all  j  G  N.  Since  d(2)  ±  0,  it  follows  that  df]  =  sign  (vj).  > 
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Since  the  vectors  &2'1  and  S00^  can  be  used  to  prune  D  to  a  singleton,  and  since  S'2'1 
is  the  vector  in  O  that  makes  the  smallest  angle  with  v,  it  is  tempting  to  conjecture 
that  any  vector  in  D  making  a  smaller  angle  with  v  than  d(oo)  can  be  use  to  prune  to 
a  singleton.  However,  this  is  not  the  case,  as  the  following  counterexample  shows. 

Example  8.7  In  5ft5,  let  v  =  (1.1, 1,  .01,  .01,  .01).  Then  d(oo)  is  the 
vector  of  ones  in  5ft5,  and  |^rj[  =  C  1.  Observe  that  A(u)  contains 
no  vectors  with  a  1  in  the  first  component;  otherwise,  its  inner  product 
with  v  must  be  positive. 

Now  let  d  =  e2.  Then  ^  =  1  >  Thus,  d  ±  d(2)  makes  a 

smaller  angle  with  v  than  d(oo)  does,  but  {d}  U  A(v)  does  not  positively 
span  5ft5  because  it  cannot  positively  generate  ej.  ■ 

The  next  theorem  shows  that  for  any  nonzero  vector  v  G  5ft",  any  direction  d 
satisfying  properties  (8.2)-(8.4)  can  be  completed  into  a  positive  spanning  set  by 
adding  the  directions  in 

A(v)  =  {d  G  D  :  vTd  <  0}  ,  (8.8) 

and  consequently,  v  prunes  ID)  to  the  single  vector  {d.}. 

Theorem  8.8  The  set  ID)  =  {-1,0,  l}n  has  p(D)  =  1. 

Proof.  Let  v  €  5ftn  be  nonzero,  and  let  d  G  D  be  any  vector  satisfying  the  proper¬ 
ties  (8.2)— (8.4),  and  define  D(u)  to  be  the  union  of  {d}  with  the  set  A(v)  of  directions 
in  D  that  make  negative  inner  products  with  d. 

Theorem  3.3  states  that  ID>(u)  =  (d}UA(u)  C  D  positively  spans  5ft"  if  and  only  if, 
for  any  nonzero  b  G  5ft"  there  exists  a  vector  d  in  P(-y)  such  that  bTd  >  0.  Let  b  G  5ft" 
be  a  nonzero  vector.  If  bTd  0,  then  clearly  either  of  ±d  makes  a  positive  inner 
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product  with  b ,  in  which  case,  we  are  done  since  both  are  in  D(u).  Thus,  consider  the 
situation  where  bTd  =  0.  The  analysis  is  divided  in  two  disjoint  cases. 

Case  1:  biVi  <  0  for  some  index  %  €  N .  Set  d3  =  0  for  j  €  N  \  {i}  and  di  =  sign(6,). 
It  follows  that  d  belongs  to  A(v)  C  D(u)  and  makes  a  positive  inner  product  with  b 
since 

vrd  =  Vjdj  =  VjSign(bj)  =  -VjSign(vj)  =  -\vj\  <  0,  and 

bTd  =  bjdj  =  bjsign(bj)  =  \bj\  >  0. 

Case  2:  b^i  >  0  for  all  i  €  N.  From  (8.2)  and  (8.3),  we  have  b{di  >  0  for  all 
i  E  N,  and  since  bTd  =  0,  it  follows  that  =  0  for  all  i  G  N.  Let  i,j  €  N  be 
such  that  |v{|  >  \ve\  for  all  t  €  N  and  bj  ±  0.  Then  k  =  0,  d3  =  0,  and  by  (8.4), 
di  —  sign(uj)  7^  0.  Furthermore,  since  d3  =  0,  \v3\  <  |uj|.  Set  de  =  0  for  £  G  N  \  { i,] } 
and  di  —  --sign(uj)  and  d3  =  sign (vj).  It  follows  that  d  belongs  to  A(v)  C  D>(u)  and 
makes  a  positive  inner  product  with  b  since 

vTd  =  Vidi  +  vjdj  =  —ViSign.(vj)  +  VjSign(bj)  <  -|vi|  +  |vj|  <  0,  and 

bTd  =  bidi  +  bjdj  —  6jsign(6j)  =  |6,-|  >  0. 

Thus  W(v)  =  {d},  and  the  proof  is  complete  since  v  ^  0  was  arbitrary.  ■ 

We  now  apply  these  results  to  the  determination  of  the  poll  set  in  the  GPS  al¬ 
gorithm.  Of  course,  for  v  =  -V f{xk),  the  set  A(-V/(xfc))  would  contain  all  ascent 
directions,  which  -V/(xfe)  then  prunes  away.  Therefore,  the  ascent  directions  in 
A (— V/(x*;))  do  not  need  to  be  explicitly  constructed. 

The  notation  established  earlier  becomes  clear.  For  v  =  —  V/(xfc),  the  vectors 
d W  and  d-00'1  are  the  negatives  of  the  normalized  l\  and  gradients  of  /  at  x*,, 
respectively  (see  [28]),  while  d(2)  is  the  vector  in  D  that  makes  the  smallest  angle  with 
— V/(xfc). 
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Theorem  8.9  If  Vf(xk)  is  available  at  iteration  k  of  the  GPS  algorithm 
with  positive  spanning  set  D  —  ID>  and  if  Dk  =  ID )(-V/(xfc))  =  {d}  U 
A(-V/(x*:))  with  d  satisfying  the  properties  (8.2)-(8.4),  then  a  single 
function  evaluation  is  required  for  the  poll  step. 


Proof.  Theorem  8.8  and  Proposition  8.4  with  v  =  -Xf(xk)  guarantee  the  result.  ■ 
Thus,  choosing  the  rich  set  of  directions  B  and  using  the  gradient  information 
allows  us  to  evaluate  the  barrier  objective  function  fx  at  a  single  poll  point  xk  +  Akd. 
This  may  not  require  any  evaluations  of  the  objective  function  /  if  the  trial  point  lies 
outside  of  X ,  or  obviously  if  V/(x k)  =  0. 


8.3.2  Pruning  with  an  Approximation  of  the  Gradient 

In  many  engineering  applications,  derivatives  are  approximated  or  inaccurately  com¬ 
puted  without  much  additional  cost  during  the  computation  of  the  objective.  Let  us 
define  a  measure  of  the  quality  of  an  approximation  of  a  vector. 


Definition  8.10  Let  g  be  a  nonzero  vector  in  3?”  and  e  >  0.  Define 
Je(g)  =  {i  €  N  :  |&|  +  e  >  ||f/||oo},  and  for  every  i  e  N  set 


sign(si) 

0 


if  i  €  Jt(g) 
otherwise. 


The  vector  g  is  said  to  be  an  e— approximation  to  the  large  components 
of  a  non-zero  vector  v  €  3?"  if  i  G  Je(g)  whenever  \vi\  =  Uwjloo  and  if 
sign(<?,)  =  sign(uj)  for  every  %  e  J£(g). 


Note  that  if  e  =  0,  then  d£(g)  is  identical  to  d(1)  in  (8.5),  and  if  e  =  ||#||oo,  then 
then  d£(g)  is  identical  to  d(oo)  in  (8.7).  The  following  result  implicitly  provides  a 
sufficiency  condition  on  the  quality  of  the  approximation. 
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Proposition  8.11  Let  e  >  0  and  g  be  an  e— approximation  to  the  large 
components  of  a  non-zero  vector  v  G  5ftn.  Then  de(g)  satisfies  proper¬ 
ties  (8.2)-(8.4). 

Proof.  Let  e  >  0  and  g  7^  0  be  an  e— approximation  to  the  large  components  of 
v  ^  0  e  9?n.  The  set  J€(g)  is  nonempty  and  contains  every  i  for  which  |?y|  =  ||f||oo- 
If  i  G  Je{g )  then  d\(g)  =  sign (p*)  =  sign(^),  and  therefore  properties  (8.2)-(8.4)  are 
satisfied.  If  i  £  J€(g)  then  the  properties  are  trivially  satisfied.  ■ 

The  next  result  establishes  a  mild  accuracy  requirement  for  an  approximation  of 
the  negative  gradient  to  prune  f(xk))  to  a  singleton. 

Lemma  8.12  Let  e  >  0  and  gk  be  an  e— approximation  to  the  large 
components  of  V/(xfe)  7^  0.  If  there  exists  a  vector  d  satisfying  (8.2)- 
(8.4)  for  both  v  =  -Vf{xk)  and  v  =  -gk,  then  the  set  {d}  (J  A(-V/(xfc)) 
positively  spans  and  prunes  to  {d}. 

Proof.  This  follows  directly  from  the  proof  of  Theorem  8.8.  * 

The  significance  of  this  result  resides  in  the  wide  latitude  allowed  for  the  approx¬ 
imation.  For  example,  if  d(oo)  with  v  =  -gk  is  used  (see  (8.7))  to  obtain  d,  then  we 
only  need  the  components  of  the  approximation  gk  to  match  signs  with  those  of  the 
true  gradient  Vf(xk )  in  order  to  prune  to  a  singleton.  If  d(1)  with  v  =  -gk  is  used 
(see  (8.5))  to  obtain  d,  then  we  only  need  the  component  of  largest  magnitude  of  gk 
to  have  the  same  sign  as  that  of  Vf(xk). 

The  following  result  shows  that  if  an  approximation  to  the  gradient  matches  signs 
with  the  true  gradient  for  all  components  of  sufficiently  large  magnitude,  then  the 
approximation  prunes  the  set  of  poll  directions  to  a  singleton. 

Theorem  8.13  Let  e  >  0.  At  iteration  k  of  the  GPS  algorithm  with 
D  =  B  =  {—1,0, 1}”,  let  gk  ^  0  be  an  e— approximation  of  V/(x k)  7^  0. 
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Then  the  set  of  directions  Dk  =  {de(g)}[)A(-Vf(xk))  positively  spans 
Rn.  Thus,  Dpk  =  {d({g)}. 

Proof.  This  follows  directly  by  combining  the  fact  that  de(g)  is  the  same  if  con¬ 
structed  from  v  =  Vf(xk)  or  v  =  gk  and  by  Lemma  8.12.  ■ 

Observe  that  when  e  =  0  or  e  =  \\g\\oo,  then  Theorem  8.13  reduces  to  Theorem  8.9 
with  d€(g)  =  d ^  or  d€(g)  =  S°°\  respectively. 

8.3.3  Pruning  with  Incomplete  Derivative  Information 

It  is  possible  that  in  some  instances  and  some  iterations,  the  entire  gradient  is  not 
available,  but  a  few  partial  or  directional  derivatives  might  be  known.  This  situation 
may  occur  for  example  when  f(x,y)  =  g(x,y)  x  h(y)  where  g{x.y)  is  analytically 
given,  but  the  structure  of  h(y)  is  unknown.  In  such  a  case  the  partial  derivatives  of 
/  can  be  computed  with  respect  to  x  but  not  with  respect  to  y. 

If  at  iteration  k,  the  directional  derivative  f'(xk;d)  exists,  is  available,  and  is 
nonnegative,  then  polling  in  the  direction  d  from  xk  will  fail  if  the  mesh  size  parameter 
is  small  enough.  This  leads  to  the  following  result. 

Proposition  8.14  Let  Dk  CD  be  a  positive  spanning  set  and  Vk  be  the 
subset  of  directions  d  in  Dk  for  which  the  directional  derivative  f(xk;  d)  > 

0  is  known.  Then  the  pruned  set  of  directions  is  Dpk  =  Dk\  Vk  and  contains 
|-Dfc|  —  |  Vk |  directions. 

Proof.  The  result  follows  from  the  fact  that  the  directional  derivative  f'{xk,d)  equals 

<rvf{xk). 

Note  that  when  the  gradient  exists,  if  f'(xk\d )  is  nonpositive,  then  f'{xk\~d) 
is  nonnegative.  The  most  typical  application  of  incomplete  gradient  information  is 
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when  some,  but  not  all,  partial  derivatives  are  known.  The  spanning  set  O  can  be 
used  to  reduce  as  much  as  possible  the  size  of  the  pruned  poll  set.  The  approach  for 
doing  so  can  be  outlined  as  follows: 

Pruning  Operation  for  Incomplete  Gradients 
For  xk  E  and  a  set  of  nonzero  partial  derivatives  -i£-{x k),  j  =  1, 2, . . .  ,*  <  n, 

1.  Set  14  =  {s^.e*.  :  j  =  1,2,...,*}  C  0>,  where  sfi  =  sign(^-(xfc)),  j  = 
1,2,...,*, 

2.  Set  Wk  =  {ee  :  £  E  L},  where  L  =  N  \  •  •  •  ,*t}, 

t 

3.  Set  u  —  ^  ^  Sij €+. 

j= i  ieL 

4.  Set  =  14  U  W4  U  {u},  and  prune  Dk  to  Dk  =  Wk  U  {u} 

For  this  construction,  we  can  prove  the  following  result: 

Corollary  8.15  Let  the  partial  derivatives  at  iteration  k,  ^~{xk)  for 
j  =  1,2,  ...,*  be  available  and  all  nonzero.  If  Vf(xk)  exists,  then  the 
pruning  operation  yields  a  pruned  set  Dvk  with  n  +  1  —  *  directions. 

Proof.  By  the  construction  above,  it  is  clear  that  B  =  14  U  Wk  forms  a  basis  for 
3?n,  and  u  =  -  Y?i=i  bi,  where  *>j  E  B,i  =  1,2, ...  ,n.  Thus  Dk  =  B  U  {u}  forms  a 
positive  basis  for  5?"  with  \Dk\  =  n  +  1.  Furthermore,  since  V fix k)  exists,  for  any 
vj  E  14,  we  have  f'{xk\Vj)  =  Vf{xk)TVj  =  V f(xk)T =  \-^-{xk)\  >  0.  Since  14 
has  *  directions,  the  result  follows  from  Proposition  8.14.  ■ 

The  following  example  illustrates  this  result. 

Example  8.16  Consider  /  :  3?3  — >  5 R  with  -§^{xk)  >  0  and  -§^(xk)  <  0, 
where  /  is  continuously  differentiable  at  xk.  Then,  with  D  =  {  —  1,0,  l}n, 
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by  defining  D k  as 


Dk 


1  0  0-1 

0-10  1  , 
0  0  1-1 


the  pruned  set  Dpk  can  be  constructed  using  only  the  two  last  columns  of 
Dk. 


The  results  derived  in  this  section  are  compatible  with  the  previous  ones.  Indeed, 
if  the  full  gradient  is  available  and  if  all  its  component  are  non-zero,  then  using  D  the 
pruned  set  requires  n  +  1  —  n  =  1  direction,  i.e.,  p(B)  =  1. 


8.3.4  Pruning  with  Linear  Constraints 

A  similar  strategy  can  be  used  in  the  bound  or  linearly  constrained  case.  Recall  that 
X  =  {x  G  5RTl  :  Ax  <  6}  defines  the  feasible  region,  where  A  G  Qmxn  and  b  G  3?rn. 
Set  M  ={1,2,...,  m}  and  let  aj  be  the  j-th  row  of  A  for  j  G  M. 

For  any  e  >  0  and  x  G  X,  define  Ac(x)  =  {j  G  M  :  ajx  >  bj  —  e},  the  set  of 
e-active  constraints  (as  used  in  [95]).  The  set  of  positive  spanning  directions  D  is 
implicitly  defined  to  contain  all  tangent  cone  generators  of  all  points  on  all  faces  of 
the  poly  tope  X.  Obviously,  this  set  is  never  explicitly  constructed.  However,  for  any 
x  G  X  and  for  a  given  e  >  0,  we  will  need  to  construct,  as  suggested  in  [95],  the 
set  T(x )  C  D  of  tangent  cone  generators  to  all  the  e- active  constraints.  With  this 
construction,  if  Dk  C  D  is  a  positive  spanning  set  containing  T{xk ),  then  Dk  will 
conform  to  the  boundary  of  A  for  e  >  0. 
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If  the  gradient  is  known,  then  we  can  find  a  bound  for  the  minimal  cardinality  of 
the  pruned  set  through  the  following  approach  and  subsequent  proposition. 

Pruning  Operation  under  Linear  Constraints 

For  xk  G  with  known  gradient  V f(xk), 

1.  Let  T(xk)  be  the  set  of  tangent  cone  generators  to  all  e- active  constraints. 

2.  Set  D'  =  T{xk )  U  {d  G  D  :  Xf(xk)Td  >  0}. 

3.  If  D'  positively  spans  3?n,  set  Dk  —  D'\  otherwise,  set  Dk  —  D'  U  {d},  where 
d  satisfies  properties  (8.2)-(8.4). 

4.  Prune  Dk  to  Dpk. 

Proposition  8.17  Let  e  >  0.  If  Xf(xk)  is  known  at  iteration  k ,  then  the 
pruning  operation  yields  a  pruned  set  Dk  containing  at  most  \T(xk)\  +  1 
directions. 

Proof.  By  construction,  in  the  worst  case,  Dk  =  T(xk )  U  {d}  U  {d  €  D  :  V  f(xk)Td  > 
0},  and  Dpk  =  T{xk )  U  {d},  whose  cardinality  is  |T(aJfc)|  +  1.  ■ 

Again,  this  result  is  in  agreement  with  previous  ones:  in  the  unconstrained  case,  or 
if  there  are  no  e-active  constraints,  then  Ae{xk)  is  empty,  reducing  Dk  to  a  singleton. 
Also,  when  {d}  is  used,  then  d  points  outside  X  for  some  boundary  point  within  e  of 
xk  and  if  the  trial  point  xk  +  Afd  does  not  belong  to  X,  then  /  will  not  be  evaluated 
at  that  trial  point. 

If  only  incomplete  derivative  information  is  available,  the  gradient  cannot  be  used 
as  strongly  to  prune  the  set  of  directions.  Define  D'  —  T(xk)l){d  G  Vk  :  f'(xk,  d)  >  0}, 
then  construct  Dk  by  completing  (if  necessary)  D'  into  a  positive  spanning  set.  As 
before,  Dpk  is  obtained  by  pruning  ascent  directions  from  Dk.  By  construction,  all 
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directions  in  {d  €  Vk  :  f'(xk\d )  >  0}  will  get  pruned,  and  some  directions  in  T(xk) 
might  get  pruned. 

8.4  Convergence  Results 

Except  for  a  slight  modification  in  one  key  theorem,  convergence  results  for  this 
method  are  identical  to  those  of  Audet  and  Dennis  [6],  and  are  detailed  in  [2]. 
Furthermore,  they  are  summarized  in  Chapter  4,  and  proved  in  the  more  general 
MVP  case  in  Chapter  5.  Thus,  convergence  results  are  omitted  here,  except  in  the 
following  theorem  where  a  difference  is  noted. 

But  first,  as  in  Chapter  4,  we  make  the  following  assumptions: 

Al:  All  iterates  {xk}  produced  by  the  algorithm  lie  in  a  compact  set. 

A2:  The  set  of  directions  D  =  GZ,  as  defined  in  (4.3),  includes  tangent  cone 
generators  for  every  point  in  X. 

A3:  The  rule  for  selecting  positive  spanning  sets  Dk  conforms  to  X  for  some  e  >  0. 

Note  that,  as  in  Chapter  4,  Assumption  A2  is  satisfied  if  if  G  =  I  and  the  con¬ 
straint  coefficient  matrix  A  is  rational. 

<■* 

Theorem  8.18  Let  x  be  the  limit  of  a  refining  subsequence,  and  let 
d  be  any  direction  in  D  for  which  /  at  a  POLL  step  was  evaluated  or 
pruned  by  the  gradient  for  infinitely  many  iterates  in  the  subsequence. 

Under  assumptions  A1-A3,  if  /  is  Lipschitz  near  x,  then  the  generalized 
directional  derivative  of  /  at  x  in  the  direction  d  is  nonnegative,  i.e., 

/°(x;  d)  >  0. 

Proof.  Let  {xk}kel(  be  a  refining  subsequence  with  limit  point  x  and  d  G  D  obtained 
as  in  the  statement  of  the  Theorem  (finiteness  of  D  ensures  the  existence  of  d).  Since 
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/  is  locally  Lipschitz  near  x,  we  have  from  Clarke  [31]  by  definition  that: 

,  „  A  f{y  +  td)  —  f(y)  r  f{xk  +  Akd)  —  f(xk)  fQ 

f  (x:d)  =  limsup  — - { - —  >  hmsup - t -  •  (8-9) 

First  note  that  since  /  is  Lipschitz  near  x ,  it  must  be  finite  near  x.  Since  infeasible 
points  are  not  evaluated  by  the  algorithm,  the  hypothesis  of  /  being  evaluated  or 
pruned  infinitely  many  times  in  direction  d  guarantees  that  infinitely  many  right- 
hand  quotients  are  defined. 

The  analysis  is  divided  in  two  cases.  First,  consider  the  case  where  the  gradient 
is  evaluated  only  a  finite  number  of  times  in  the  subsequence  { xk}keK •  Then  these 
finite  number  of  iterates  may  be  ignored,  and  therefore,  for  k  sufficiently  large,  all 
the  POLL  steps  in  the  direction  d ,  xk  +  A kd,  are  feasible.  If  they  had  not  been,  then 
/  would  not  have  been  evaluated.  Thus,  we  have  that  infinitely  many  of  the  right 
hand  quotients  of  (8.9)  are  defined.  Then  all  of  them  must  be  nonnegative  or  else  the 
corresponding  POLL  step  would  have  found  an  improved  mesh  point,  a  contradiction 
(recall  that  refining  subsequences  are  constructed  from  mesh  local  optimizers). 

Second,  consider  the  case  where  there  is  an  infinite  number  of  iterates  in  the 
subsequence  where  the  gradient  is  used.  Then  there  is  a  subsequence  that  converges 
to  x  for  which  dTVf(xk)  >  0  and  thus  the  right  hand  side  of  (8.9)  is  bounded  below 
by  zero.  ■ 


8.5  Numerical  Experiments 

The  algorithms  described  in  this  chapter  were  implemented  in  the  Matlab®  soft¬ 
ware,  NOMADm  [1],  (discussed  in  Section  7.1),  and  applied  to  18  problems  from  the 
CUTE  [18]  collection,  as  well  as  to  a  simple  example  to  illustrate  the  value  of  a  simple, 
but  not  trivial,  SEARCH  step. 
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Most  of  the  CUTE  problems  we  tested  have  multiple  first-order  stationary  points, 
which  is  important  if  we  are  to  compare  the  solutions  found  by  various  version  of  the 
algorithm.  For  each  problem,  performance  of  these  algorithms  is  compared  among 
several  poll  strategies.  No  SEARCH  method  is  employed,  and  all  problems  are  solved 
to  within  10~4  accuracy  or  until  50,000  function  evaluations  are  performed.  (This 
was  the  reasonable  limit  of  the  NOMADm  code.)  A  mathematical  description  of  each 
problem  is  provided  in  Appendix  A. 

Specifically,  Table  8.1  shows  objective  function  value  attained,  number  of  function 
evaluations,  and  number  of  gradient  evaluations  for  each  problem.  The  seven  poll 
strategies  identified  in  the  column  headings  of  Table  8.1  differ  in  their  choice  of 
directions  and  are  described  as  follows: 

•  Stand2n:  standard  2 n  directions,  D  =  [I,  -I]  with  no  gradient-pruning. 

•  Standn+1:  standard  n  +  1  directions,  D  =  [/, -e]  with  no  gradient-pruning, 
where  e  is  the  vector  of  ones. 

•  Grad2n:  gradient-pruned  subset  of  the  standard  2 n  directions. 

•  Gradn+1:  gradient-pruned  subset  of  the  standard  n  +  1  directions. 

•  Gradjn  :  gradient-pruned  subset  of  ID)  =  {  —  1,0,  l}n,  pruned  by  d(1) 

•  Gradj^:  gradient-pruned  subset  of  ID)  =  {-1,0,  l}71,  pruned  by  S-2\ 

•  Grad^:  gradient-pruned  subset  of  ID)  =  {-1,0,  l}n,  pruned  by  S°°\ 

•  Grad^2n:  a  combination  of  Grad^?  and  Grad2n. 

First,  the  example  to  illustrate  the  value  of  a  very  basic  SEARCH  step. 
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Example  8.19  We  ran  NOMADm  on  the  problem, 

min  f(x,  y)  =  x3  +  y3  -  10(a;2  +  y2) 
s.t.  —5  <=  x  <—  10 
—5  <=  y  <=  10 

with  an  initial  point  of  x0  =  (0.5, 0.5).  The  results  were: 

•  Stanch  found  x  =  (6.666,  —5)  with  f(x)  =  —523  in  94  evaluations 
of  /  and  no  evaluations  of  V/. 

•  Grad^  found  x  =  (6.666,6.666)  with  f(x)  =  —296  in  33  evalua¬ 
tions  of  /  and  17  evaluations  of  V/. 

•  Grad^  with  only  a  two  point  feasible  random  search  found  x  — 

(—5,  —5)  with  f(x)  =  —750  in  85  evaluations  of  /  and  5  evaluations 
of  V/. 

Thus,  Stand2n  found  a  good  local  minimizer,  and  Grad^  without  a 
search  finds  a  higher  local  optimum.  But,  when  we  add  the  2-point  random 
search  to  Grad^,  we  find  the  global  optimum. 

For  the  CUTE  problem  results  given  in  Table  8.1  below,  numbers  appearing  in 
parentheses  indicate  that  a  second  set  of  runs  was  performed  for  that  problem,  but 
with  a  slightly  perturbed  initial  point.  In  these  cases,  if  a  particular  entry  contains  no 
number  in  parentheses,  then  that  value  remained  unchanged  in  the  second  run.  The 
initial  point  was  perturbed  whenever  the  number  of  required  function  evaluations  for 
a  run  was  very  small.  This  occurs  when  the  choices  of  initial  point  and  poll  directions 
quickly  drive  the  algorithm  to  an  exact  stationary  point  ( i.e .,  Vf(xk)  =  0).  Since  this 
phenomenon  is  not  common  in  practice,  we  didn’t  want  to  make  a  particular  strategy 
to  appear  gratuitously  advantageous  over  another. 
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It  should  be  noted  that  three  problems,  ALLINIT,  ALLINITU,  and  EXPFIT  do 
not  have  starting  points  associated  with  them.  Since  ALLINIT  is  the  constrained 
version  of  ALLINITU,  we  chose  the  same  reasonable  and  feasible  initial  point  for 
both  problems,  while  the  vector  of  ones  was  chosen  for  EXPFIT.  Also,  the  problem 
OSLBQP  has  an  infeasible  starting  point  with  respect  to  one  of  its  lower  bounds.  Our 
NOMADm  software  automatically  restores  any  such  variable  to  its  nearest  bound 
before  proceeding  with  the  algorithm.  (In  the  case  of  general  linear  constraints,  our 
software  shifts  an  infeasible  starting  point  to  the  nearest  feasible  point  with  respect 
to  the  Euclidean  norm.) 

In  general,  the  gradient-pruned  GPS  methods  converged  in  fewer  function  evalu¬ 
ations  than  the  standard  GPS  methods,  but  they  often  terminated  at  a  point  with 
a  higher  function  value  (and  vice  versa),  particularly  if  only  one  gradient- pruned  de¬ 
scent  direction  was  used  in  the  POLL  step.  In  fact,  in  no  case  did  a  standard  method 
require  fewer  function  evaluations  than  a  “371”  gradient  method,  and  in  only  one  case 
(MDHOLE,  Stand„+i)  did  a  standard  method  require  fewer  function  evaluations  than 
its  gradient-pruned  counterpart  with  the  same  poll  directions  (e.g.,  Stand2n  versus 
Grad2n,  Standn+i  versus  Gradn+1).  Also,  Grad^2n  was  always  at  least  as  fast  as 
Stand2n. 

However,  the  problems  ALLINITU,  MARATOSB,  MEXHAT,  and  OSBORNEA 
are  examples  of  cases  where  a  gradient-pruned  method  converged  to  a  lower  point 
than  a  standard  poll.  In  fact,  Grad^  (with  only  one  function  evaluation  per  iteration) 
achieved  a  lower  function  value  for  OSBORNEA  than  both  standard  poll  methods 
despite  fewer  function  evaluations. 

The  problem  PALMER1  is  an  interesting  example  with  typical  behavior.  The 
three  highly-pruned  Grad3n  strategies  converge  the  quickest,  but  to  the  much  poorer 
solution  of  43904  than  the  other  methods.  The  methods  that  used  the  most  directions, 
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Stand2„,  Grad2„,  and  Grady?2n,  converge  to  the  best  solution  of  11760,  with  Grad2„ 
saving  a  considerable  number  of  function  evaluations  over  Stand2n.  The  two  n  +  1 
direction  methods  converge  to  a  solution  of  21166,  with  gradient-pruning  again  saving 
a  significant  number  of  function  evaluations.  Finally,  the  Grad^n  strategy  achieves 
the  best  optimal  value  faster  than  Stand2„,  but  it  wasn’t  able  to  match  Grad2n  in 
function  evaluations,  perhaps  because  it  requires  slightly  more  function  evaluations 
per  POLL  step. 
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Table  8.1  Numerical  Results  for  Selected  CUTE  Test  Problems. 


ALLINIT,  n  =  4 

Stand2n 

Standn+i 

Grad2n 

Gradn+i 

Grad'V 

Grad  3^ 

Grad^) 

Gradgn^ 

Final  /-value 

26.7643 

26.7643 

26.7643 

26.7643 

26.7643 

26.7643 

26.7643 

26.7643 

Function  evaluations 

85 

85 

47 

47 

47 

56 

47 

56 

Gradient  evaluations 

0 

0 

10 

10 

10 

10 

10 

10 

ALUNITU,  n  =  4 

Stand2n 

Standn+i 

Grad2n 

Gradn+i 

Gradji? 

Grad  3^ 

Grad^ 

Gradgn  2n 

Final  /-value 

6.9288 

5.7444 

6.9288 

5.7661 

5.8006 

5.7812 

9.2523 

6.9293 

Function  evaluations 

1628 

425 

846 

133 

30 

65 

25 

650 

Gradient  evaluations 

0 

0 

140 

20 

7 

14 

7 

65 

BARD,  n  =  3 

Stand2ri 

Standn+ l 

Grad2n 

Gradn4-i 

Grad^n 

Grad  3^ 

Grad  3^^ 

GradSjJ  2ri 

Final  /-value 

0.0082 

0.0122 

0.0082 

0.0089 

0.0089 

0.0083 

0.0091 

0.0082 

Function  evaluations 
Gradient  evaluations 

11061 

0 

50000+ 

0 

5963 

1640 

50000+ 

16384 

2430 

1018 

7799 

2474 

1263 

636 

4871 

722 

BOX2,  n  =  3 

Stand2n 

StandTl+i 

Grad2n 

Gradn+i 

GradgV 

Gradgn* 

Grad^f  * 

Grad^n  2n 

Final  /-value 

0 

0 

0 

0 

0 

0 

0 

0 

Function  evaluations 

61 

48 

31 

32 

25 

25 

25 

56 

Gradient  evaluations 

0 

0 

2 

2 

8 

8 

8 

4 

BOX3,  n  =  3 

Stand2,i 

Standn +i 

Grad  2 

Gradn+i 

Grad  3!? 

Grad  3^ 

Grad  3^^ 

Grad3n  2n 

Final  /-value 

0 

0 

0 

0 

0.5656 

0.2253 

0.5656 

0 

Function  evaluations 

91 

63 

46 

32 

17 

32 

17 

62 

Gradient  evaluations 

0 

0 

2 

2 

2 

2 

2 

2 

DENSCHNA,  n  =  2 

Stand2u 

Standri  4-1 

Grad2n 

G  rad  „  4_i 

Grad3^ 

Grad^ 

Grad  3^ 

Grad3n2n 

Final  /-value 

0 

0 

0 

0 

0 

0 

0 

0 

Function  evaluations 
Gradient  evaluations 

73(148) 

0 

47(156) 

0 

67(86) 

4(22) 

47(92) 

2(30) 

5(56) 

3(20) 

2(23) 

2(8) 

2(23) 

2(8) 

2(83) 

2(14) 

DENSCHNB,  n  =  2 

Stand2n 

StandTl  4-1 

Grad2n 

Gradn  +  i 

Grad3n 

Grad  3i^ 

Grad  3^^ 

Grad3n  2n 

Final  /-value 

0 

0 

0 

0 

0 

0 

0 

0 

Function  evaluations 
Gradient  evaluations 

68(119) 

0 

130(95) 

0 

68(66) 

3(15) 

87(53) 

29(16) 

6(50) 

3(16) 

4(28) 

3(11) 

4(27) 

3(10) 

4(71) 

3(10) 

DENSCHNC,  n  =  2 

Stand2n 

Standn+ 1 

Grad2n 

Gradn4-i 

Grad3n 

Grad  3^ 

Grad^' 

Grad3n2n 

Final  /-value 

0 

0(0.0001) 

0 

0(0.001) 

0 

0 

0 

0 

Function  evaluations 
Gradient  evaluations 

75(119) 

0 

54(662) 

0 

67(68) 

5(15) 

50(384) 

4(168) 

7(87) 

4(31) 

5(103) 

3(33) 

4(62) 

3(28) 

16(98) 

5(16) 

EXPFIT,  n  =  2 

Stand2n 

Stand  n  1 

Grad2n 

Gradn+i 

Grad  3V 

Grad  3^ 

Grad  3^ 

Grad 3 a  2n 

Final  /-value 

0.2405 

0.2406 

0.2405 

0.2406 

0.2405 

0.2405 

0.2405 

0.2405 

Function  evaluations 

300 

999 

191 

524 

164 

163 

81 

198 

Gradient  evaluations 

0 

0 

66 

251 

66 

69 

37 

47 

MARATOSB,  n  =  2 

Stand2n 

Standn-|- 1 

Grad2n 

Grad,!  4-1 

Grad3n 

Grad3n^ 

Grad  3^* 

Grad3",2n 

Final  /-value 

-1 

0 

-0.0041 

0 

-1 

-1 

-1 

-1  ■ 

Function  evaluations 

66 

50 

42 

34 

17 

17 

17 

17 

Gradient  evaluations 

0 

0 

3 

3 

2 

2 

2 

2 

Stand2n  I  Standn+i  Grad2n  Gradn-i-i  |  Grad 


Stand2n  Standn+i  Grad2n  Gradn+i  |  Grad^ 


11127.6 

15 

1 


-0.0401 

203 

54 


-0.0401 

31 

7 


-0.0401 

32 

4 


-0.0401 

285 

53 


Standn+i  Grad2n  Gradn-j-i  Grader, 


1446638  1692509  1446638  [23591906  5469396  22772194  1692509 

2581  1243  1439  395  1768  298  1703 

0  434  659  204  767  155  434 


Standn+i  Grad2n  Gradn+i  Grad^/"  GradgJ  Grad3n 
0.00029  |  0.00106  0.00029  0.00185  0.00015  0.00089~ 

13300  1222  7783  496  1455  542 

0  183  2042  286  528  314 


Stand2n  Standn-j-i 
6^25  |  6^25 


Final  /-value _ 

Function  evaluations 
Gradient  evaluations 


PALMER1B,  n  =  2 

Final  /-value _ 

Function  evaluations 
Gradient  evaluations 


Final  /-value 


Function  evaluations 
Gradient  evaluations 


116.6 

50000+ 

0 


Stand2n 

4.013f= 

50000+ 

0 


0.00012 

1770 

237 


0.04014 

6574 

355 


Stand2n  Standn-j-i  Grad2n  Gradn+i 
11760  21166  11760  21166 

1498  438  730  208 

0  0  |  160  I  64 

Stand2n  I  Standn+i  Grad2n  Gradn+i 


7965.8 

50000+ 

0 


41.3 

50000+ 

5636 


393.8 

50000+ 

9213 


45.9 

50000+ 

4360 


rnmnwmmimmwsmmi 


13474.1 

50000+ 

25001 


5135.9 

50000+ 

25001 


Stand2n  |  Standn+i  |  Grad2 
16948398]  30078  | 125149 


50000+ 

0 


799100 

50000+ 

4230 
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Chapter  9 

Conclusions  and  Recommendations 

9.1  Summary  and  Conclusions 

A  new  class  of  algorithms  for  solving  mixed  variable  optimization  problems  with 
nonlinear  constraints  has  been  presented.  It  is  a  generalization  of  existing  GPS  algo¬ 
rithms  in  that,  it  reduces  to  that  of  [7]  or  [8]  if  there  are,  respectively,  no  nonlinear 
constraints  or  categorical  variables.  New  convergence  theorems  have  been  proved, 
with  results  consistent  with  those  of  existing  algorithms.  Furthermore,  the  algorithm 
represents  the  first  optimization  algorithm  with  provable  convergence  properties  that 
can  be  applied  to  MVP  problems  with  general  nonlinear  constraints.  This  has  been 
demonstrated  in  practice  by  the  solution  of  the  problem  described  in  Chapter  7. 

The  following  list  describes  the  new  contributions  of  this  work: 

•  the  concept  of  P-regularity  and  its  relationship  to  the  Clarke  calculus  given  in 
Section  3.3. 

•  the  observation  that  convergence  of  GPS  algorithms  for  MVP  problems  requires 
the  specific  assumption  that  discrete  neighbors  of  any  iterate  must  lie  on  the 
current  mesh. 

•  an  improved  construction  of  the  mesh  for  MVP  problems,  in  which  a  different 
generating  matrix  may  be  used  for  each  set  of  discrete  variable  values,  while 
preserving  all  the  convergence  properties  of  the  algorithms. 

•  the  new  convergence  results  of  Section  5.3  for  the  MGPS  algorithm  for  bound 
and  linear  constrained  MVP  problems.  These  results  apply  weaker  hypotheses 
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than  those  found  in  [8]  to  more  fully  characterize  certain  limit  points,  consistent 
with  existing  theory  [6,  7]. 

•  the  FMGPS  algorithm  described  in  Chapter  6,  with  all  of  its  development  and 
convergence  analysis  given  there. 

•  the  weakening  of  the  strict  differentiability  hypothesis  needed  for  convergence  to 
a  first-order  stationary  point  in  constrained  problems.  This  is  applied  through¬ 
out  the  results  of  Chapters  5  and  6. 

•  the  implementation  of  the  algorithms  into  the  Matlab®  software,  NOMADm 
(see  Section  7.1). 

•  A  generalization  of  the  thermal  insulation  system  design  problem  described  in 
Section  7.2,  in  which  variable  cross-sectional  areas  are  permitted  and  nonlinear 
constraints  on  stress,  mass,  and  thermal  contraction  are  added. 

•  the  successful  study  of  numerical  performance  of  the  FMGPS  algorithm  on  the 
thermal  insulation  system  problem,  which  demonstrates  the  ability  of  FMGPS 
to  solve  mixed  variable,  nonlinearly  constrained  problems. 

•  much  of  the  theory  surrounding  the  GPS  algorithm  with  derivative  information; 
most  notably,  the  theorems  that  describe  how  to  prune  the  {-1,  0,  l}n  positive 
spanning  set  to  a  singleton  when  the  gradient  or  an  approximation  is  available. 

•  the  numerical  experiments  in  Chapter  8,  in  which  different  approaches  are  tested 
on  problems  from  the  CUTE  test  set. 
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9.2  Future  Areas  of  Research 

From  this  work,  several  observations  and  questions  have  arisen  that  are  now  topics 
of  future  research  and  are  now  discussed. 

Additional  Study  of  the  Filter  Approach  Perhaps  the  most  important  draw¬ 
back  of  the  filter  GPS  algorithm  is  its  inability  to  ensure  convergence  to  a  first-order 
stationary  point.  Another  is  the  additional  hypotheses  necessary  to  ensure  that  cer¬ 
tain  generalized  derivatives  are  nonnegative. 

To  address  the  latter  concern,  one  possible  area  of  future  work  is  to  modify  the 
filter  algorithm  so  that  the  maximum  allowable  constraint  violation  hmax  is  allowed  to 
decrease  during  the  iteration  sequence.  A  similar  scheme  is  used  by  Fletcher  et  al.  [49] 
to  avoid  the  problem  of  blocking  entries  discussed  in  Section  4.2.1.  The  idea  of  block¬ 
ing  entries  seems  to  be  closely  related  to  the  additional  hypothesis  in  Theorems  6.18 
and  6.19;  he.,  that  the  current  poll  center  (as  opposed  to  a  different  filter  point)  must 
filter  a  trial  point  in  a  direction  infinitely  often  in  a  refining  subsequence,  in  order  to 
ensure  nonnegativity  of  the  generalized  directional  derivative  in  that  direction. 

However,  this  modification  does  not  appear  to  solve  the  problem  of  trying  to 
converge  to  a  KKT  point  if  the  functions  are  sufficiently  smooth  there.  It  is  not  clear 
that  guaranteed  convergence  is  even  possible  with  a  filter  pattern  search  approach, 
since  being  able  to  generate  the  tangent  cone  at  the  limit  point  would  typically  require 
the  algorithm  to  use  an  infinite  number  of  directions.  The  grid- based  methods  of 
Coope  and  Price  [34]  offer  an  alternative  construction  that  allows  an  infinite  number 
of  directions,  but  their  construction  requires  that  the  directions  satisfy  a  uniform 
linear  independence  condition.  Still,  grid-based  methods  appear  to  be  a  promising 
area  for  achieving  first-order  convergence  for  general  NLP  problems. 
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GPS  with  Derivative  Information  for  Nonlinear  Constrained  Problems. 

One  of  our  major  unexplored  areas  that  still  needs  a  lot  of  rigorous  development  is 
how  to  exploit  derivative  information  in  the  presence  of  general  nonlinear  constraints. 
Our  current  implementation  in  NOMADm  uses  a  variant  of  the  feasible  directions 
algorithm  [139],  in  which  a  feasible  descent  direction  is  found  and  then  modified  so 
that  the  resulting  descent  direction  is  still  feasible  and  puts  the  next  trial  point  on 
the  mesh.  This  appears  to  force  convergence  to  a  KKT  point,  although  some  informal 
attempts  at  solving  a  relatively  simple  problem  resulted  in  rather  slow  and  inaccurate 
convergence.  At  this  point,  convergence  to  a  KKT  point  is  still  a  conjecture  that  has 
not  been  rigorously  established.  It  is  also  unclear  what,  if  anything,  can  be  said  about 
the  quality  of  the  solution  when  only  some  derivatives  are  available. 

Additional  Nonsmooth  Analysis.  Although  a  hierarchy  of  GPS  convergence 
results  is  now  well  established,  we  have  not  been  able  to  establish  conditions  for 
which  0  £  df(x)  but  V f(x)  either  does  not  exist  or  is  nonzero  at  a  point  x.  We 
have  examples  in  which  the  gradient  does  not  exist  at  x  and  0  ^  df  (x),  even  though 
generalized  directional  derivatives  are  all  nonnegative  in  a  set  of  positive  spanning 
directions.  Of  course,  if  we  have  strict  differentiability  at  a  point  having  nonnegative 
generalized  directional  derivatives  in  a  set  of  positive  spanning  directions,  then  we 
have  0  =  Vf(x)  £  df(x).  We  also  have  an  example  of  a  convex  function  for  which 
GPS  converges  to  a  point  with  nonzero  gradient.  However,  the  open  question  is 
whether  GPS  can  converge  to  a  point  x  at  which  0  g  df(x),  when  the  function  /  is 
differentiable,  but  not  strictly  differentiable. 

Probabilistic  Analysis.  Since  GPS  methods  are  directional  based  methods, 
there  is  little  chance  in  practice  of  converging  to  a  stationary  point  that  is  not  a  local 
minimum;  however,  that  possibility  does  exist,  and  there  are  a  number  of  examples 
that  illustrate  this.  In  fact,  Audet  [4]  shows  a  pathological  case  in  which  GPS  con- 
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verges  to  a  global  maximum  (it  starts  there  and  never  moves).  However,  it  would 
be  interesting  to  see  if  stronger  optimality  results  could  be  proved  in  probability  or 
almost  surely,  if  some  randomness  was  introduced  in  the  selection  of  starting  points 
and  poll  directions,  in  order  to  avoid  the  pathological  cases. 

This  approach  is  different  from  Hart  [68,  69,  70],  who  performs  some  probabilistic 
analysis  of  an  evolutionary  pattern  search  method.  In  his  approach,  randomness  is 
part  of  the  algorithm,  and  convergence  in  probability  is  to  a  first-order  stationary 
point.  This  is  actually  a  weaker  result  than  traditional  GPS  methods  ( e.g .,  under 
continuously  differentiable  /,  P(liminffc  ||V/(a;fc)||  =  0)  =  1). 

GPS  with  Finite  Differences.  Since  gradients  can  be  used  to  prune  directions 
for  GPS,  there  is  a  certain  argument  for  applying  finite  difference  approximations  in  an 
effort  to  accelerate  convergence  without  requiring  explicit  computation  of  gradients. 
This  is  briefly  explored  in  a  slightly  different  context  by  Coope  and  Price  [34].  Such 
an  implementation  would  have  to  address  the  truncation  error  associated  with  the 
approximation  and  how  that  affects  the  ability  of  the  method  to  guarantee  a  descent 
direction.  Strategies  dealing  with  how  often  to  update  the  approximation  would  also 
have  to  be  explored. 

As  an  example,  suppose  that  each  POLL  step  uses  the  standard  2 n  directions, 
[I,  —I}.  Then  a  poll  that  does  not  find  an  improved  mesh  point  will  cost  2 n  function 
evaluations.  However,  we  can  compute  a  centered-difference  approximation  to  the 
gradient  at  the  current  poll  center  using  those  2 n  function  evaluations.  Of  course,  the 
accuracy  of  the  approximation  will  depend  on  how  fine  the  mesh  is  at  that  iteration. 
At  the  next  iteration,  after  the  mesh  size  is  refined,  we  can  simply  poll  in  one  descent 
direction,  based  on  the  approximate  gradient;  thus  saving  2n-l  function  evaluations. 
The  main  drawback  to  this  approach  is  the  potential  inaccuracy  of  the  approximation. 
It  is  possible  to  generate  a  descent  direction  with  respect  to  the  approximate  gradient 
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that  is  actually  not  a  descent  direction.  Therefore,  updating  the  approximation  as 
the  mesh  gets  finer,  may  be  required  to  ensure  convergence  to  a  first-order  stationary 
point. 

The  use  of  finite  difference  approximations  is  not  limited  to  mesh  local  optimizers. 
Even  if  an  improved  mesh  point  is  found,  the  function  values  of  the  new  and  old 
iterates  generate  a  backward-difference  approximation  to  the  directional  derivative  at 
the  new  iterate  in  the  successful  direction. 

Increasing  Flexibility  in  the  SEARCH  Step.  In  many  engineering  applications 
with  very  expensive  function  evaluations,  a  typical  SEARCH  step  consists  of  optimizing 
a  less  costly  surrogate  function  and  mapping  the  resulting  optimal  point  (or  several 
“close-to-optimal”  points)  to  the  mesh.  However,  the  forcing  of  points  to  the  mesh  in 
order  to  satisfy  GPS  theory  can  often  lead  to  a  point  which  is  not  an  improvement  over 
the  incumbent  design.  Since  this  scenario  can  prove  costly,  adding  more  flexibility  in 
the  SEARCH  step  is  something  that  is  worth  pursuing. 

Two  approaches  appear  to  be  worth  considering.  First,  despite  the  lack  of  sup¬ 
porting  theory,  we  have  observed  in  practice  that  success  has  been  achieved  using 
surrogates  without  mapping  the  resulting  point  to  the  mesh.  We  observe  that  one 
can  conduct  the  SEARCH  on  a  different,  more  refined  mesh  than  what  is  used  in  the 
POLL  step,  even  though  the  theory  does  not  explicitly  allow  it.  The  reason  for  this  is 
that  if  the  SEARCH  step  finds  an  improved  mesh  point  (or  an  unfiltered  point),  then 
poll  is  not  performed,  and  if  Search  does  not  find  an  improved  mesh  point  (or  unfil¬ 
tered  point),  then  it  can  be  viewed  as  never  having  been  performed.  There  is  a  certain 
argument  that  if  the  SEARCH  step  is  performed  on  an  ultra-fine  mesh  -  on  the  level  of 
machine  precision,  then  the  computer’s  floating  point  arithmetic  acts  as  the  function 
that  forces  points  onto  the  mesh.  While  noting  its  effectiveness  in  practice,  we  have 
been  unable  to  rigorously  prove  that  this  works,  and  it  remains  an  open  question 
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for  further  study.  One  complication  is  the  non-uniformity  of  machine-representable 
numbers. 

The  other  approach,  which  is  potentially  more  fruitful,  is  the  grid-based  methods 
of  Coope  and  Price  [35].  This  method  is  similar  to  the  basic  GPS  in  that  points  are 
evaluated  on  a  grid  that  is  generated  by  a  set  of  positive  spanning  directions.  But 
unlike  GPS  methods,  they  allow  the  evaluation  of  points  not  on  the  grid.  An  attempt 
is  made  to  move  back  to  the  grid  in  subsequent  iterations,  but  in  certain  cases,  the 
next  grid  is  translated  to  the  new  point  instead.  The  drawback  to  this  approach  is 
that  infinite  refinement  of  the  mesh  must  be  assumed;  whereas,  this  is  proved  for  GPS 
methods.  Grid-based  methods  also  require  a  sufficient  decrease  condition  ( e.g .,  see 
[41])  to  guarantee  convergence;  whereas  GPS  methods  require  only  simple  decrease 
in  the  objective  function  value. 

As  previously  discussed,  one  other  benefit  of  grid-based  methods  is  that  they  al¬ 
low  grids  to  be  constructed  using  an  infinite  set  of  directions,  which  can  be  beneficial 
in  devising  a  derivative-free  method  that  converges  to  a  KKT  point.  However,  this 
added  flexibility  generates  a  requirement  that  the  directions  satisfy  a  uniform  lin¬ 
ear  independence  condition.  Serious  implementation  issues  must  be  addressed,  but 
the  potential  for  solving  nonlinearly  constrained  problems  makes  this  an  attractive 
research  area. 

Hybrid  Algorithms.  Finally,  since  the  SEARCH  step  provides  the  user  with  great 
flexibility,  it  also  allows  the  user  to  construct  hybrid  algorithms.  The  search  heuris¬ 
tics  mentioned  in  Section  2.2  are  certainly  among  the  possibilities  for  this  option.  For 
example,  since  the  simulated  annealing  algorithm  (see  Section  2.2.1)  can  be  guaran¬ 
teed,  under  certain  conditions,  to  converge  in  probability  to  the  global  optimum,  it 
would  be  interesting  to  find  out  what  can  be  proved  if  it  were  applied  to  a  surrogate 
function  within  the  SEARCH  step  of  a  GPS  algorithm.  It  may  be  possible  to  construct 
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the  hybrid  in  a  way  that  would  retain  GPS  theoretical  properties  and  performance, 
but  would  also  preserve  the  global  quality  of  the  solution  that  simulated  annealing 
theory  provides.  This  may  be  possible  if  each  surrogate  function  can  be  proved  to 
converge  to  its  corresponding  true  function  in  the  limit. 

Computational  Studies.  There  does  not  appear  to  be  any  significant  computa¬ 
tional  studies  that  involve  GPS  methods  -  either  comparisons  of  different  implemen¬ 
tations,  or  comparisons  with  other  algorithms.  In  particular,  a  computational  study 
of  the  augmented  Lagrangian  approach  of  Lewis  and  Torczon  [95]  and  the  filter  GPS 
method  of  Audet  and  Dennis  [7]  would  be  very  useful.  While  the  former  guaran¬ 
tees  convergence  to  a  KKT  point  for  sufficiently  smooth  functions  (which  the  latter 
does  not),  it  requires  the  user  to  deal  with  a  penalty  parameter  and  derivative-free 
Lagrange  multiplier  estimates,  which  can  significantly  affect  numerical  performance 
and  tends  to  be  highly  problem-dependent. 

Test  Problem  Sets.  Related  to  such  studies  is  the  need  for  a  suite  of  engineering 
test  problems  similar  to  the  classes  of  problems  targeted  here.  Specifically,  there  are 
very  few  readily  available  test  problems  in  which  functions  are  expensive  black  box 
simulations  with  no  limitations  as  to  the  local  smoothness  properties  or  variable 
types.  A  collection  of  such  problems  would  be  of  great  benefit  to  the  optimization 
community  in  evaluating  performance  of  GPS  and  other  algorithms. 

Other  Software  Issues.  In  addition  to  the  areas  described  here,  several  im¬ 
provements  can  be  made  with  respect  to  the  NOMADm  software  package.  Perhaps 
the  most  important  and  helpful  improvement  would  be  to  make  it  more  adaptable 
to  other  software  packages  and  codes.  Currently,  the  only  capability  NOMADm 
has  in  this  area  is  the  ability  to  interface  with  pre-compiled  or  Matlab®-compiled 
FORTRAN-77  function  files.  It  would  be  a  much  more  attractive  package  if  it  could 
be  linked  with  C,  C++,  and  AMPL®  [53]  files. 
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Appendix  A 
Test  Problems 

The  purpose  of  this  chapter  is  to  provide  the  reader  with  additional  data  with  respect 
to  the  computational  results  recorded  in  this  document.  Specifically,  we  now  provide 
a  mathematical  description  of  each  of  the  18  CUTE  [18]  test  problems  referred  to 
in  Chapter  8  (see  Section  8.5)  and  solved  using  the  NOMADm  software.  The  other 
problem  addressed  numerically  in  this  document  (i.e.,  the  thermal  insulation  system 
design  problem)  is  described  in  detail  in  Section  7.2.  Matlab®  source  code  for  all  of 
these  problems,  along  with  the  entire  NOMADm  source  code,  is  available  for  download 
on  the  internet  [1]. 

ALLIN1TU: 


mmf(xi,X2,x3,x4)  =  x3  -  1  +  x\  +  x\  +  (x3  +  x4)2  +  sin(x4)4 

+2  sin (2:3) 2  +  x2x2  +  x4  -  3  +  (x4  -  l)2  +  x\ 
+[x3  +  (xi  +  x4)2]2  +  [xj  -  4  +  sin(x4)2  +  xjxl}2 


Initial  Point:  not  specified  (used  (xi,x2,x3,x4)  =  (0, 1, 0.5, 2).) 
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ALLINIT: 

min  f(x  i,x2,  £3,  £4)  =  £3  —  1  +  x\  +  x\  +  ( x 3  +  X4)2  +  sin^)4 

+2sin(a;3)2  +  x\x\  +  £4  —  3  +  (£4  —  l)2  +  x 
+[xg  +  {xi  +  £4)2]2  +  [xi  -  4  +  sin (£4) 2  +  xlxl]2 

subject  to 
0  <x2 
-1010  <  3:3  <  1 

2  =  £4 

Initial  Point:  not  specified  (used  (xi,  £2, £3,2:4)  =  (0, 1, 0.5, 2).) 

BARD: 

16  r  /  j  \12 

mmf(xi,x2,x3)  -  ^  Yj  +  x2(16  -  j)  +  x3  min  {j,  16  -  j}  ) 

3=1 

where  Y  =  (.14,  .18,  .22,  .25,  .29,  .32,  .35,  .39,  .37,  .58,  .73,  .96, 1.34, 2.10, 4.39). 

Initial  Point:  (xi,  x2,  £3)  =  (1, 1, 1) 

BOX2: 

10 

min  f(xu  x2,  x3)  =  ^  (e~01jXl  -  e~01jX2  -  x3e~01j  +  x3e~3)2 
i=i 

subject  to  £3  =  1 

Initial  Point:  (£i,x2,£3)  =  (0,10,1) 

BOX3: 

10 

min  f(xux2, x3)  =  Y,  (e"°  liXl  -  e-ay*2  -  x3e~01j  +  x3e~i)2 

3= 1 


Initial  Point:  ( xi,x2,x3 )  =  (0, 10, 1) 


to  ^ 


DENSCHNA: 


min/(xi,x2)  =  x\  +  (xi  +  x2 )2  +  (e12  -  l)2 

Initial  Point:  ( xl:x2 )  =  (1,1) 

DENSCHNB: 

min  f{xux2)  =  (®i  -  2)2  +  [x2(®i  -  2)]2  +  (x2  +  1) 

Initial  Point:  (xj.,x2)  =  (1,1) 

DENSCHNC: 

min  f(xi,x2)  =  (x2  +  x\  -  2)2  +  [e^1-15  +  x\  -  2]2 

Initial  Point:  (xi,x2)  =  (2,3) 

EXPFIT: 

10 

min/(xi,x2)  =  ^  (x^025^2  -  0.25j)2 
i=i 

Initial  Point:  (xi,x2)  =  (1,1) 

MARATOSB: 

min/(xi,x2)  =  xj  +  10“6(x2  +  x2  -  l)2 

Initial  Point:  (xj,x2)  =  (0,0) 

MDHOLE: 

min/(xi,x2)  =  Xj  +  10_2(sin(xi)  -  x2)2 
subject  to  Xi  >0 


Initial  Point:  (xi,x2)  =  (10, 10) 


MEXHAT: 


min /(x  1,0:2)  =  —  2(xi  —  l)2  +  104(— 0.02  +  10  4(x2  —  x2)2  +  (xi  —  l)2)2 

Initial  Point:  (xi,X2)  =  (0.86,0.72) 

MEYER3: 

16  r  /  x  \  1 2 

min/(xi,x2,*3)  =  5^  Yj  ~  XlBxp  ’ 

where  Y  =  (34780, 28610,  23650, 19630, 16370, 13720, 11540, 9744, ... 

8261, 7030, 6005, 5147, 4427,  3820,  3307, 2872). 

Initial  Point:  (xi,X2,x3)  =  (0.02,4000,250) 

OSBORNEA: 

33 

min/(xi,x2,X3,x4,x5)  =  [Yj  -  (x,  +  x2e{-w{j~1)x‘l)  +  x3e(-10(j-1)a:5))]2 , 

i=i 

where 

Y  =  (0.844, 0.908, 0.932, 0.936, 0.925, 0.908, 0.881, 0.850, 0.818, 0.784, 0.751, 

0.718, 0.685, 0.658, 0.628, 0.603, 0.580, 0.558, 0.538, 0.522, 0.506, 0.490, . 
0.478, 0.467, 0.457, 0.448, 0.438, 0.431, 0.424,  0.420, 0.414, 0.411, 0.406). 

Initial  Point:  (xi,  X2,  x3,  X4,  X5)  =  (0.5,1.5,-1,0.01,0.02) 

OSBORNEB: 

75 

min  /(xi,  •  •  • ,  xn)  =  [Yj  -  (A,-  +  Bj  +  Cj  +  Dj)}2 


where 


Aj 


Xie(-i0-i)x5) 


1 


Bj  =  X2e-*o(-m-i)-*9)2 , 
Cj  =  ®3e_I7(--lW“1)_x,o)2, 


Dj  =  ^e-^C-iO-D-xn)^  and 


y  -  (1.366, 1.191, 1.112, 1.013, 0.991, 0.885, 0.831, 0.847, 0.786, 0.725, 

0.746, 0.679, 0.608, 0.655, 0.616, 0.606, 0.602, 0.626, 0.651, 0.724, . 
0.649, 0.649, 0.694, 0.644, 0.624, 0.661, 0.612, 0.558, 0.533, 0.495, . 
0.500, 0.423, 0.395, 0.375, 0.372,  0.391,  0.396, 0.405, 0.428, 0.429, . 
0.523, 0.562, 0.607, 0.653, 0.672, 0.708, 0.633, 0.668, 0.645, 0.632, . 
0.591, 0.559, 0.597, 0.625, 0.739,  0.710, 0.729, 0.720, 0.636,  0.581, . 
0.428, 0.292, 0.162, 0.098, 0.054). 

Initial  Point:  (xu  . . . ,  zn)  =  (1.3, 0.65, 0.65, 0.7, 0.6, 3, 5, 7, 2, 4.5, 5.5) 

OSLBQP: 

1  8 

min  f(xi , . . . ,  xa)  =  +  2x5  -  x8  +  -  x] 

1  j=i 


subject  to 
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2.5 

< 

Xi 

0 

< 

X2 

<  4.1 

0 

< 

X3 

0 

< 

X4 

0.5 

< 

X5 

<  4.0 

0 

< 

x6 

0 

< 

x7 

0 

< 

X8 

<  4.3 

Initial  Point:  (xu . . . ,  x8)  =  (0.5, 0.5, 0.5, 0.5, 0.5,  0.5, 0.5, 0.5) 

PALMER1: 

31 

min  f(xi,  x2,xs,x4)  =  ^ 

3= 1 

subject  to  X{  >  10-4,  i  =  2, 3,4, 


where 


X  =  (-1.788963,  -1.745329,  -1.658063,  -1.570796,  -1.483530,  -1.396263, . 

-1.308997,  -1.218612,  -1.134464,  -1.047198,  -0.872665,  -0.698132, .. 
-0.523599,  -0.349066,  -0.174533,0.000000, 1.788963, 1.745329, ... 
1.658063, 1.570796, 1.483530, 1.396263, 1.308997, 1.218612, ... 

1.134464, 1.047198, 0.872665, 0.698132, 0.523599, 0.349066, 0.174533), 

Y  =  (78.596218, 65.77963, 43.96947, 27.038816, 14.6126, 6.2614, ... 

1.538330, 0.00000, 1.188045, 4.6841, 16.9321, 33.6988, ... 

52.3664,  70.1630, 83.4221,  88.3995,  78.596218,  65.77963, ... 

43.96947, 27.038816, 14.6126, 6.2614, 1.538330, 0.00000, ... 

1.188045, 4.6841, 16.9321, 33.6988, 52.3664, 70.1630, 83.4221). 


Initial  Point:  (xj,  x2,  x3,  x4)  =  (1, 1, 1, 1) 

PALMER1A: 


3i  r  \  i  2 

min  /(xi, . . . ,  x$)  =  Yj  —  |^xi  +  X2 Xj  +  x^Xj  +  2J 


subject  to  Xi  >  0,  i  —  5, 6, 


where  X  and  Y  are  defined  in  problem  PALMER1. 
Initial  Point:  (xi,x2,x3,x4,x5,x6)  =  (1, 1, 1, 1, 1, 1) 

PALMER1B: 

31 

min/(xi,x2,x3,x4)  = 

i= i 

subject  to  Xi  >  0,  z  =  3, 4, 


Yj  —  l  X\Xj  +  x2Aj  + 


*3 


XA  +  X,2 


where  X  and  Y  are  defined  in  problem  PALMER1. 
Initial  Point:  (xi,  x2,  x3,  x4)  =  (1,1, 1,1) 


PALMERIC: 


31 

min/(xi,...,x8)  = 

M 


where  X  and  Y  are  defined  in  problem  PALMER1. 
Initial  Point:  (x\,  ...,x8)  —  (1, 1, 1, 1, 1, 1, 1, 1) 


